[PATCH] D151468: [RISCV] Use v(f)slide1up for shuffle+insert idiom

Sun May 28 02:43:26 PDT 2023

frasercrmck accepted this revision.
frasercrmck added a comment.
This revision is now accepted and ready to land.

LGTM with a minor typo

In D151468#4376270 <https://reviews.llvm.org/D151468#4376270>, @reames wrote:

> In D151468#4375309 <https://reviews.llvm.org/D151468#4375309>, @frasercrmck wrote:
>
>>   ; This one matches
>>   %vb = insertelement <4 x i8> poison, i8 %b, i64 0
>>   %v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
>>   %v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>
>>
>>   ; I wouldn't have thought the splat index should matter
>>   %vb = insertelement <4 x i8> poison, i8 %b, i64 0
>>   %v1 = shufflevector <4 x i8> %vb, <4 x i8> poison, <4 x i32> zeroinitializer
>>   %v2 = shufflevector <4 x i8> %v1, <4 x i8> %v, <4 x i32> <i32 1, i32 4, i32 5, i32 6>
>
> This is an artifact of where I put the matching code.  This isn't a sub-vector insert, and I'd placed the vslide1up match below that check.  Good catch.  If you're okay with it, I'd like to land this as is, and then handle this in a separate patch.  I need to write dedicated matching for the vslide1down case anyways, and I'd like to have the test coverage in tree already.

Yes that sounds good.

>>   ; Should be the same as the first example, but doesn't match
>>   %vb = insertelement <4 x i8> poison, i8 %b, i64 0
>>   %v2 = shufflevector <4 x i8> %vb, <4 x i8> %v, <4 x i32> <i32 0, i32 4, i32 5, i32 6>
>
> This looks identical to the new vslide1up_4xi8_swapped test.  It does match on RV64, are you maybe testing this on RV32?  There's something a bit odd going on with that one; we're widening the op type on the insert.  I have not looked into that one yet.

Yes good spot - I was testing only with RV32, which I don't normally do when playing around with things.

================
Comment at: llvm/test/CodeGen/RISCV/rvv/fixed-vector-shuffle-vslide1up.ll:291
+; The length of the shift is less than the suffix, since we'd have to
+; materailize the splat, using the vslide1up doesn't help us.
 define <4 x i32> @vslide1up_4xi32_neg1(<4 x i32> %v, i32 %b) {
----------------
typo: materialize

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151468/new/

https://reviews.llvm.org/D151468