[PATCH] D85364: [SVE][WIP] Implement lowering for fixed width select

Cameron McInally via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 6 08:48:08 PDT 2020


cameron.mcinally added a comment.

In D85364#2198257 <https://reviews.llvm.org/D85364#2198257>, @efriedma wrote:

> In D85364#2198164 <https://reviews.llvm.org/D85364#2198164>, @cameron.mcinally wrote:
>
>> In D85364#2197915 <https://reviews.llvm.org/D85364#2197915>, @efriedma wrote:
>>
>>>> At VL=512, the v8i1 mask will be promoted to v8i64. In order to lower this to a scalable mask, we'd need to insert the v8i64 subvector into a nxv2i64. And then truncate that ZPR by performing a CMPNE against 0, to get the final nxv2i1 mask. Between the zero extend to promote the vXi1 mask, and the truncate to get back to a nxvXi1, there's a lot of extra instructions.
>>>
>>> I had this concern when I was reviewing the code in question. @paulwalker-arm said he found the conversions were usually folded away in his prototype. Most i1 vectors will be produced by a compare that returns an nxv2i1 or something like that.
>>
>> I guess if it is amortized away, it's not a big deal. But a CMPNE is 4 cycles and a SX is 4 cycles. So we have an 8 cycle no-op. That's not great.
>
> Folded away, as in, DAGCombine gets rid of the extra instructions.  If that doesn't work right now, we should be able to make it work with a little more code.

I see now. My misunderstanding was that the folding away only happened within the loop block. And that we'd still have to pay the cost pre and post loop.

Reiterating what (I think) Paul said from the SVE Sync-up call, it sounds like he has a plan for DAGCombine to clean these up during ISel, instead of during type legalization. If that's correct, then it should be all good.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85364/new/

https://reviews.llvm.org/D85364



More information about the llvm-commits mailing list