[llvm] [AArch64][GlobalISel] Combine G_EXTRACT_VECTOR_ELT and G_BUILD_VECTOR sequences into G_SHUFFLE_VECTOR (PR #110545)

Mon Sep 30 11:10:50 PDT 2024

================
@@ -397,10 +397,13 @@ define void @test_vrev64(ptr nocapture %source, ptr nocapture %dst) nounwind ssp
 ;
 ; CHECK-GI-LABEL: test_vrev64:
 ; CHECK-GI:       // %bb.0: // %entry
+; CHECK-GI-NEXT:    adrp x8, .LCPI27_0
 ; CHECK-GI-NEXT:    ldr q0, [x0]
-; CHECK-GI-NEXT:    add x8, x1, #2
-; CHECK-GI-NEXT:    st1.h { v0 }[6], [x1]
-; CHECK-GI-NEXT:    st1.h { v0 }[5], [x8]
+; CHECK-GI-NEXT:    ldr q2, [x8, :lo12:.LCPI27_0]
+; CHECK-GI-NEXT:    tbl.16b v0, { v0, v1 }, v2
+; CHECK-GI-NEXT:    mov h1, v0[1]
+; CHECK-GI-NEXT:    str h0, [x1]
+; CHECK-GI-NEXT:    str h1, [x1, #2]
 ; CHECK-GI-NEXT:    ret
 entry:
   %tmp2 = load <8 x i16>, ptr %source, align 4
----------------
ValentijnvdBeek wrote:

This code outputs the following from IR translator:
```llvm
bb.1.entry:
  liveins: $x0, $x1
  %0:_(p0) = COPY $x0
  %1:_(p0) = COPY $x1
  %4:_(s64) = G_CONSTANT i64 6
  %6:_(<2 x s16>) = G_IMPLICIT_DEF
  %7:_(s64) = G_CONSTANT i64 0
  %9:_(s64) = G_CONSTANT i64 5
  %11:_(s64) = G_CONSTANT i64 1
  %2:_(<8 x s16>) = G_LOAD %0:_(p0) :: (load (<8 x s16>) from %ir.source, align 4)
  %3:_(s16) = G_EXTRACT_VECTOR_ELT %2:_(<8 x s16>), %4:_(s64)
  %5:_(<2 x s16>) = G_INSERT_VECTOR_ELT %6:_, %3:_(s16), %7:_(s64)
  %8:_(s16) = G_EXTRACT_VECTOR_ELT %2:_(<8 x s16>), %9:_(s64)
  %10:_(<2 x s16>) = G_INSERT_VECTOR_ELT %5:_, %8:_(s16), %11:_(s64)
  G_STORE %10:_(<2 x s16>), %1:_(p0) :: (store (<2 x s16>) into %ir.dst)
  RET_ReallyLR
```

Which is then combined into:
```llvm

bb.1.entry:
  liveins: $x0, $x1
  %0:_(p0) = COPY $x0
  %1:_(p0) = COPY $x1
  %2:_(<8 x s16>) = G_LOAD %0:_(p0) :: (load (<8 x s16>) from %ir.source, align 4)
  %12:_(<8 x s16>) = G_IMPLICIT_DEF
  %10:_(<2 x s16>) = G_SHUFFLE_VECTOR %2:_(<8 x s16>), %12:_, shufflemask(6, 5)
  G_STORE %10:_(<2 x s16>), %1:_(p0) :: (store (<2 x s16>) into %ir.dst)
  RET_ReallyLR
```

Which is the code that we would expect, since it creates a vector with the correct elements in the right place. It is not exactly SDAG, since it misses an optimization for shufflevector where it unmerges twice into the third half of the first vector and then reverses it. 


https://github.com/llvm/llvm-project/pull/110545