[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane

Tue Jan 10 07:51:34 PST 2023

MattDevereau added inline comments.

================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:30
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 8 x half> @llvm.vector.insert.nxv8f16.v8f16(<vscale x 8 x half> poison, <8 x half> [[TMP2]], i64 0)
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 8 x half> @llvm.aarch64.sve.dupq.lane.nxv8f16(<vscale x 8 x half> [[TMP3]], i64 0)
+; CHECK-NEXT:    ret <vscale x 8 x half> [[TMP4]]
----------------
sdesmalen wrote:
> It's a pity this case isn't optimised. This is probably because instcombine has transformed it into a 'shufflevector' splat. You could do something like this:
> 
>   if (Value *V = getSplatValue(CurrentInsertElt)) {              
>     InsertEltVec[0] = V;                                         
>     InsertEltVec.resize(1);                                      
>   } else {                                                       
>     while (InsertElementInst *InsertElt =                        
>                dyn_cast<InsertElementInst>(CurrentInsertElt)) {  
>     ...
>   }
> 
> to handle it.
I think the assembly output ends up as just a single MOV instruction and there isn't much to gain here (I'll double check this), the thinking behind this test was more just to prove the recursion can indeed get the smallest possible pattern `a`.

================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:94
+
+define dso_local <vscale x 8 x half> @dupq_f16_abcnull_pattern(half %a, half %b, half %c, half %d) {
+; CHECK-LABEL: @dupq_f16_abcnull_pattern(
----------------
sdesmalen wrote:
> Is this test much different from `@dupq_f16_ab_pattern_no_end_indices`?
This test would splat `<a, b, c, _>` as a 64bit value whereas `@dupq_f16_ab_pattern_no_end_indices` would splat `<a, b>` as a 32bit value, which could check the algorithm can distinguish between the two, however I suppose that difference is ultimately not so important so I'll remove the indices tests.

================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:237
+
+define dso_local <vscale x 8 x half> @dupq_f16_ab_pattern_no_end_indices_not_poison(half %a, half %b, <vscale x 8 x half> %c) {
+; CHECK-LABEL: @dupq_f16_ab_pattern_no_end_indices_not_poison(
----------------
sdesmalen wrote:
> no end indices not poison? (perhaps there is a double negative here that confuses me?)
> 
> In any case, the end indices are poison, because the test is inserting into `poison`
Some of these tests that appear redundant are leftover from the previous revisions where allowing inserts of poison values changed things. 
The first negative is the indices 6 & 7 missing from the `insertelement` chain.
This second negative is because the first parameter of the vector insert is a non-poison value`c`unlike the other tests which are all poison.

================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:260
+
+define dso_local <vscale x 8 x half> @dupq_f16_ab_no_front_pattern(half %a, half %b) {
+; CHECK-LABEL: @dupq_f16_ab_no_front_pattern(
----------------
sdesmalen wrote:
> Given the algorithm just compares two halves of the vector, I'm not sure there's much value in having both negative tests for "no_front_pattern", "no_middle_pattern" and "no_end_pattern"
These tests don't mean much without the poison value insertions now, so i'll remove them

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138203/new/

https://reviews.llvm.org/D138203