[llvm] [AArch64] Combine signext_inreg of setcc(... != splat(0)) (PR #157665)

Tue Sep 9 08:06:14 PDT 2025

================
@@ -26097,6 +26097,17 @@ static SDValue performSetCCPunpkCombine(SDNode *N, SelectionDAG &DAG) {
   return SDValue();
 }
 
+static bool isSignExtInReg(const SDValue &V) {
+  if (V.getOpcode() != AArch64ISD::VASHR ||
----------------
david-arm wrote:

This feels quite late in the pipeline if we're relying upon AArch64 ISD nodes. 

When lowering ctz_v16i1 I see this in the debug output:

```
Type-legalized selection DAG: %bb.0 'ctz_v16i1:'
SelectionDAG has 19 nodes:
  t0: ch,glue = EntryToken
          t12: nxv16i1 = AArch64ISD::PTRUE TargetConstant:i32<9>
              t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
            t23: v16i8 = sign_extend_inreg t2, ValueType:ch:v16i1
          t15: nxv16i8 = insert_subvector undef:nxv16i8, t23, Constant:i64<0>
          t17: nxv16i8 = splat_vector Constant:i32<0>
        t19: nxv16i1 = AArch64ISD::SETCC_MERGE_ZERO t12, t15, t17, setne:ch
      t20: i64 = AArch64ISD::CTTZ_ELTS t19
    t21: i32 = truncate t20
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t21
  t9: ch = AArch64ISD::RET_GLUE t8, Register:i32 $w0, t8:1
```

and there is a run of DAGCombiner immediately afterwards, which suggests that you can do this optimisation earlier and look for the SIGN_EXTEND_INREG node instead. In theory you should be able to make the codegen even better, essentially by doing:

```
  //    setcc_merge_zero(
  //       pred, insert_subvector(undef, signext_inreg(vNi1 x), 0), != splat(0))
  // => setcc_merge_zero(
  //       pred, insert_subvector(undef, x, 0), != splat(0))
```

That way you can get rid of the remaining shl instruction I think, which is also unnecessary.

https://github.com/llvm/llvm-project/pull/157665