[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 22 00:32:15 PDT 2023


sdesmalen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25881
+  for (int Index : ShuffleMask) {
+    if (Index < 0)
+      return SDValue();
----------------
When one of the indices is `poison`, it will encode that in the ShuffleMask with `-1`. Bailing out may be too conservative, you could for example choose to select the first element instead.

With your patch at the moment:

```define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){
  %op1 = load <4 x float>, ptr %a
  %op2 = load <4 x float>, ptr %b
  %1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 poison>
  ret <4 x float> %1
}
```
results in:
```foo:                                    // @foo
        ldr     q0, [x1]
        ldr     q1, [x0]
        mov     z0.s, z0.s[1]
        mov     z2.s, z1.s[3]
        str     s1, [sp, #-16]!
        stp     s2, s0, [sp, #4]
        ldr     q0, [sp], #16
        ret
```

whereas:

```define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){
  %op1 = load <4 x float>, ptr %a
  %op2 = load <4 x float>, ptr %b
  %1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 1>
  ret <4 x float> %1
}
```
results in:
```
foo:                                    // @foo
        adrp    x8, .LCPI0_0
        ldr     q0, [x0]
        ldr     q1, [x1]
        ldr     q2, [x8, :lo12:.LCPI0_0]
        tbl     z0.s, { z0.s, z1.s }, z2.s
        ret
```

It would also be good to have a test for this case (one of the shufflemask lanes being `poison`)




================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25896
+  // Choosing an out-of-range index leads to the lane being zeroed vs zero
+  // value where it world perfrom first lane duplication for out of
+  // index elements. For i8 elements an out-of-range index could be a valid
----------------
nit: s/world perfrom/would perform/ ?


================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll:4
+; RUN: llc -mattr=+sme -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_128
----------------
This line is exactly the same as on line 2?


================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll:8
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_1024
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048  < %s | FileCheck %s -check-prefixes=CHECK,SVE2_2048
 
----------------
I would actually prefer to keep these tests kind of the same as they were, and for any new tests to be moved to a new test file, e.g.  `sve-fixed-length-vector-shuffle-tbl.ll`.

Where:
* The test has RUN lines for both `+sve` and for `+sve2`
* We test with Min == Max, such that it uses TBL/TBL2 as your patch currently implements
* We test without a Max (which means Max is unbounded, and implies your current patch won't use TBL, although that could change in the future)
* There are a few tests for boundary cases using vscale_range(16, 16) and i8 vectors.

I'd recommend using the same size for the vectors in tests (now your test is using `<4 x i8>`, `<8 x i8>` and `<16 x i8>`, but I don't think that makes any difference based on the code you've added. Maybe you can just stick with `<16 x i8>` instead. If you go for 128-bit vectors, you'll need to add the `-force-streaming-compatible` flag to force the use of SVE instructions over NEON.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152205/new/

https://reviews.llvm.org/D152205



More information about the llvm-commits mailing list