[llvm] [AArch64] Combine signext_inreg of setcc(... != splat(0)) (PR #157665)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 10 03:24:06 PDT 2025
================
@@ -26097,6 +26097,17 @@ static SDValue performSetCCPunpkCombine(SDNode *N, SelectionDAG &DAG) {
return SDValue();
}
+static bool isSignExtInReg(const SDValue &V) {
+ if (V.getOpcode() != AArch64ISD::VASHR ||
----------------
david-arm wrote:
OK I think I understand this a bit more now ... If I lower these two functions:
```
define <16 x i8> @masked_load_v16i8(ptr %src, <16 x i1> %mask) {
%load = call <16 x i8> @llvm.masked.load.v16i8(ptr %src, i32 8, <16 x i1> %mask, <16 x i8> zeroinitializer)
ret <16 x i8> %load
}
define <16 x i8> @masked_load_v16i8_2(ptr %src, <16 x i8> %mask) {
%icmp = icmp ugt <16 x i8> %mask, splat (i8 3)
%load = call <16 x i8> @llvm.masked.load.v16i8(ptr %src, i32 8, <16 x i1> %icmp, <16 x i8> zeroinitializer)
ret <16 x i8> %load
}
```
we actually end up with decent codegen for `masked_load_v16i8_2`:
```
masked_load_v16i8:
shl v0.16b, v0.16b, #7
ptrue p0.b, vl16
cmlt v0.16b, v0.16b, #0
cmpne p0.b, p0/z, z0.b, #0
ld1b { z0.b }, p0/z, [x0]
ret
masked_load_v16i8_2:
movi v1.16b, #3
ptrue p0.b, vl16
cmphi p0.b, p0/z, z0.b, z1.b
ld1b { z0.b }, p0/z, [x0]
ret
```
so the problem is purely limited to the case where the predicate is an unknown live-in for the block. I see what you mean about the ordering of lowering for masked_load_v16i8, i.e. we first see
```
Type-legalized selection DAG: %bb.0 'masked_load_v16i8:'
SelectionDAG has 14 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
t16: v16i8 = sign_extend_inreg t4, ValueType:ch:v16i1
t7: v16i8 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, $
t9: v16i8,ch = masked_load<(load unknown-size from %ir.src, align 8)> t0, t2, undef:i64, t16, t7
t11: ch,glue = CopyToReg t0, Register:v16i8 $q0, t9
t12: ch = AArch64ISD::RET_GLUE t11, Register:v16i8 $q0, t11:1
...
Vector-legalized selection DAG: %bb.0 'masked_load_v16i8:'
SelectionDAG has 15 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
t21: v16i8 = AArch64ISD::VSHL t4, Constant:i32<7>
t22: v16i8 = AArch64ISD::VASHR t21, Constant:i32<7>
t7: v16i8 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, $
t9: v16i8,ch = masked_load<(load unknown-size from %ir.src, align 8)> t0, t2, undef:i64, t22, t7
t11: ch,glue = CopyToReg t0, Register:v16i8 $q0, t9
t12: ch = AArch64ISD::RET_GLUE t11, Register:v16i8 $q0, t11:1
```
then
```
Legalized selection DAG: %bb.0 'masked_load_v16i8:'
SelectionDAG has 23 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t24: nxv16i1 = AArch64ISD::PTRUE TargetConstant:i32<9>
t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %1
t21: v16i8 = AArch64ISD::VSHL t4, Constant:i32<7>
t22: v16i8 = AArch64ISD::VASHR t21, Constant:i32<7>
t27: nxv16i8 = insert_subvector undef:nxv16i8, t22, Constant:i64<0>
t30: nxv16i1 = AArch64ISD::SETCC_MERGE_ZERO t24, t27, t28, setne:ch
t31: nxv16i8,ch = masked_load<(load unknown-size from %ir.src, align 8)> t0, t2, undef:i64, t30, t28
t32: v16i8 = extract_subvector t31, Constant:i64<0>
t11: ch,glue = CopyToReg t0, Register:v16i8 $q0, t32
t28: nxv16i8 = splat_vector Constant:i32<0>
t12: ch = AArch64ISD::RET_GLUE t11, Register:v16i8 $q0, t11:1
```
It feels like a shame we're expanding the sign_extend_inreg so early on. I wonder if a cleaner solution is to fold `t16: v16i8 = sign_extend_inreg t4, ValueType:ch:v16i1` and `t9: v16i8,ch = masked_load<(load unknown-size from %ir.src, align 8)> t0, t2, undef:i64, t16, t7` into this:
```
`t9: v16i8,ch = masked_load<(load unknown-size from %ir.src, align 8)> t0, t2, undef:i64, t4, t7`
```
That would remove the extends completely and hopefully lead to better codegen too, since it will also remove the VSHL. Can we do this in the DAG combine phase after `Type-legalized selection DAG: %bb.0 'masked_load_v16i8:'`. What do you think?
https://github.com/llvm/llvm-project/pull/157665
More information about the llvm-commits
mailing list