[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 22 00:32:15 PDT 2023
sdesmalen added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25881
+ for (int Index : ShuffleMask) {
+ if (Index < 0)
+ return SDValue();
----------------
When one of the indices is `poison`, it will encode that in the ShuffleMask with `-1`. Bailing out may be too conservative, you could for example choose to select the first element instead.
With your patch at the moment:
```define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){
%op1 = load <4 x float>, ptr %a
%op2 = load <4 x float>, ptr %b
%1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 poison>
ret <4 x float> %1
}
```
results in:
```foo: // @foo
ldr q0, [x1]
ldr q1, [x0]
mov z0.s, z0.s[1]
mov z2.s, z1.s[3]
str s1, [sp, #-16]!
stp s2, s0, [sp, #4]
ldr q0, [sp], #16
ret
```
whereas:
```define <4 x float> @foo(ptr %a, ptr %b) nounwind vscale_range(1, 1){
%op1 = load <4 x float>, ptr %a
%op2 = load <4 x float>, ptr %b
%1 = shufflevector <4 x float> %op1, <4 x float> %op2, <4 x i32> <i32 0, i32 3, i32 5, i32 1>
ret <4 x float> %1
}
```
results in:
```
foo: // @foo
adrp x8, .LCPI0_0
ldr q0, [x0]
ldr q1, [x1]
ldr q2, [x8, :lo12:.LCPI0_0]
tbl z0.s, { z0.s, z1.s }, z2.s
ret
```
It would also be good to have a test for this case (one of the shufflemask lanes being `poison`)
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25896
+ // Choosing an out-of-range index leads to the lane being zeroed vs zero
+ // value where it world perfrom first lane duplication for out of
+ // index elements. For i8 elements an out-of-range index could be a valid
----------------
nit: s/world perfrom/would perform/ ?
================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll:4
+; RUN: llc -mattr=+sme -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve -force-streaming-compatible-sve < %s | FileCheck %s -check-prefixes=CHECK,CHECK_SVE
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s -check-prefixes=CHECK,SVE2_128
----------------
This line is exactly the same as on line 2?
================
Comment at: llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-vector-shuffle.ll:8
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s -check-prefixes=CHECK,SVE2_1024
+; RUN: llc -mattr=+sve2 -force-streaming-compatible-sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s -check-prefixes=CHECK,SVE2_2048
----------------
I would actually prefer to keep these tests kind of the same as they were, and for any new tests to be moved to a new test file, e.g. `sve-fixed-length-vector-shuffle-tbl.ll`.
Where:
* The test has RUN lines for both `+sve` and for `+sve2`
* We test with Min == Max, such that it uses TBL/TBL2 as your patch currently implements
* We test without a Max (which means Max is unbounded, and implies your current patch won't use TBL, although that could change in the future)
* There are a few tests for boundary cases using vscale_range(16, 16) and i8 vectors.
I'd recommend using the same size for the vectors in tests (now your test is using `<4 x i8>`, `<8 x i8>` and `<16 x i8>`, but I don't think that makes any difference based on the code you've added. Maybe you can just stick with `<16 x i8>` instead. If you go for 128-bit vectors, you'll need to add the `-force-streaming-compatible` flag to force the use of SVE instructions over NEON.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D152205/new/
https://reviews.llvm.org/D152205
More information about the llvm-commits
mailing list