[PATCH] D85364: [SVE][WIP] Implement lowering for fixed width select

Wed Aug 5 17:07:53 PDT 2020

efriedma added a comment.

In D85364#2198164 <https://reviews.llvm.org/D85364#2198164>, @cameron.mcinally wrote:

> In D85364#2197931 <https://reviews.llvm.org/D85364#2197931>, @efriedma wrote:
>
>> For `load <8 x i1>` specifically, the code is terrible because we're using the generic target-independent expansion, which goes element by element.  If we cared, we could custom-lower it to something more reasonable.  Nobody has looked into it because there isn't any way to generate that operation from C code.
>
> That's interesting. If we could load the vXi1 and then vector extend it, it might be more palatable. I haven't checked if there are instructions to support that though. And I wonder if it will get weird with a vector-of-i1s smaller than a byte...

We could do something like `svlsr(svdup(x), svindex(0,1))`.  But again, it's not really worth optimizing this.

> In D85364#2197915 <https://reviews.llvm.org/D85364#2197915>, @efriedma wrote:
>
>>> At VL=512, the v8i1 mask will be promoted to v8i64. In order to lower this to a scalable mask, we'd need to insert the v8i64 subvector into a nxv2i64. And then truncate that ZPR by performing a CMPNE against 0, to get the final nxv2i1 mask. Between the zero extend to promote the vXi1 mask, and the truncate to get back to a nxvXi1, there's a lot of extra instructions.
>>
>> I had this concern when I was reviewing the code in question. @paulwalker-arm said he found the conversions were usually folded away in his prototype. Most i1 vectors will be produced by a compare that returns an nxv2i1 or something like that.
>
> I guess if it is amortized away, it's not a big deal. But a CMPNE is 4 cycles and a SX is 4 cycles. So we have an 8 cycle no-op. That's not great.

Folded away, as in, DAGCombine gets rid of the extra instructions.  If that doesn't work right now, we should be able to make it work with a little more code.

> Also, I think we can get rid of this AND, unless I'm missing an edge case.

VSELECT masks are guaranteed to be all-ones or all-zeros for vectors types on ARM.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85364/new/

https://reviews.llvm.org/D85364