[PATCH] D152205: [Aarch64][SVE]SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.
Dinar Temirbulatov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 22 15:35:13 PDT 2023
dtemirbulatov marked an inline comment as done.
dtemirbulatov added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25868
+ // of the shufflevector, thus we are rejecting this transform.
+ (Subtarget->hasSVE2() && IndexLen < (MaxOffset / 2) &&
+ MinSVESize == MaxSVESize)) {
----------------
sdesmalen wrote:
> This condition and the comment above make little sense to me. The issue to solve is instead about the maximum *index* value in the shuffle mask. If the maximum value being indexed cannot be represented in an i8, then we cannot use TBL2 to lower this shuffle vector.
>
> For example:
>
> ```
> define <16 x i8> @foo(ptr %ptr0, ptr %ptr1) {
> %op0 = load <256 x i8>, ptr %ptr0
> %op1 = load <256 x i8>, ptr %ptr1
> ; Cannot use TBL2 instruction because any offset > 255 cannot be represented in an i8!
> %res = shufflevector <256 x i8> %op0, <256 x i8> %op1, <16 x i32> <i32 0,i32 1,i32 2,i32 3,i32 4, i32 5, i32 6, i32 7, i32 256,i32 257, i32 258, i32 259, i32 260, i32 261, i32 262, i32 263>
> ret <16 x i8> %res
> }
> ```
>
> However, the code in the patch seems to accept this without an issue and generates wrong code with the wrong indices.
> ```
> .LCPI0_0:
> .byte 0 // 0x0
> .byte 1 // 0x1
> .byte 2 // 0x2
> .byte 3 // 0x3
> .byte 4 // 0x4
> .byte 5 // 0x5
> .byte 6 // 0x6
> .byte 7 // 0x7
> .byte 16 // 0x10
> .byte 33 // 0x21
> .byte 34 // 0x22
> .byte 35 // 0x23
> .byte 36 // 0x24
> .byte 37 // 0x25
> .byte 38 // 0x26
> .byte 39 // 0x27
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .byte 255 // 0xff
> .text
> .globl foo
> .p2align 2
> .type foo, at function
>
> foo: // @foo
> .cfi_startproc
> // %bb.0:
> adrp x8, .LCPI0_0
> add x8, x8, :lo12:.LCPI0_0
> ptrue p0.b
> ldr q1, [x1]
> ldr q0, [x0]
> ld1b { z2.b }, p0/z, [x8]
> tbl z0.b, { z0.b, z1.b }, z2.b
> // kill: def $q0 killed $q0 killed $z0
> ret
> ```
I think this is correct, here is full DAG dump:
SelectionDAG has 16 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t18: v16i8,ch = load<(load (s128) from %ir.ptr0, align 256)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t16: v16i8,ch = load<(load (s128) from %ir.ptr1, align 256)> t0, t4, undef:i64
t11: v16i8 = vector_shuffle<0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23> t18, t16
t13: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
t22: nxv16i8 = insert_subvector undef:nxv16i8, t18, Constant:i64<0>
t23: nxv16i8 = insert_subvector undef:nxv16i8, t16, Constant:i64<0>
t14: ch = AArch64ISD::RET_GLUE t13, Register:v16i8 $q0, t13:1
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:25869
+ (Subtarget->hasSVE2() && IndexLen < (MaxOffset / 2) &&
+ MinSVESize == MaxSVESize)) {
+
----------------
sdesmalen wrote:
> Please check this condition earlier up for an early bail out! (at same place where it checks MinSVESize)
No, Most of those checks related to support TBL2 both opperand used and at the place where MinSVESize check we don't know content of the mask, thus could not tell that both operand would be used or not.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D152205/new/
https://reviews.llvm.org/D152205
More information about the llvm-commits
mailing list