[llvm] [AArch64] Improve lowering of scalar abs(sub(a, b)). (PR #151180)

Thu Jul 31 04:20:24 PDT 2025

================
@@ -25464,6 +25467,24 @@ static SDValue performCSELCombine(SDNode *N,
     }
   }
 
+  // CSEL a, b, cc, SUBS(SUB(x,y), 0) -> CSEL a, b, cc, SUBS(x,y) if cc doesn't
+  // use overflow flags to avoid the comparison with zero.
+  if (Cond.getOpcode() == AArch64ISD::SUBS &&
+      isNullConstant(Cond.getOperand(1))) {
----------------
rj-jesus wrote:

If going the `CombineTo` route, would it be worth combining the `Sub` with the new `Subs` too? What I mean is:
```
DCI.CombineTo(Cond.getNode(), Subs, Subs.getValue(1));
DCI.CombineTo(Sub.getNode(), Subs);
```
I believe we do something similar in `performFlagSettingCombine` for other flag-setting instructions.

Doing so means we get folds such as the one below out of the box (i.e. without having to tinker with `isWorthFoldingALU`):
```
-; CHECK-NEXT:    sxtb w8, w1
-; CHECK-NEXT:    sxtb w9, w0
-; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    sxtb w8, w0
+; CHECK-NEXT:    subs w8, w8, w1, sxtb
```

As I mentioned in another comment, I'm not entirely sure this results in better performance currently, but on the other hand, the code size reduction and potentially slight decrease in register pressure are also appealing. What do you think?

https://github.com/llvm/llvm-project/pull/151180