[llvm] [PowerPC] fold i128 equality/inequality compares of two loads into a vectorized compare using vcmpequb.p when Altivec is available (PR #158657)

Fri Oct 24 07:21:53 PDT 2025

================
@@ -15556,6 +15595,70 @@ SDValue PPCTargetLowering::combineSetCC(SDNode *N,
       SDValue Add = DAG.getNode(ISD::ADD, DL, OpVT, LHS, RHS.getOperand(1));
       return DAG.getSetCC(DL, VT, Add, DAG.getConstant(0, DL, OpVT), CC);
     }
+
+    // Optimization: Fold i128 equality/inequality compares of two loads into a
+    // vectorized compare using vcmpequb.p when Altivec is available.
+    //
+    // Rationale:
+    //   A scalar i128 SETCC (eq/ne) normally lowers to multiple scalar ops.
+    //   On VSX-capable subtargets, we can instead reinterpret the i128 loads
+    //   as v16i8 vectors and use the Altive vcmpequb.p instruction to
+    //   perform a full 128-bit equality check in a single vector compare.
+    //
+    // Example Result:
+    //   This transformation replaces memcmp(a, b, 16) with two vector loads
+    //   and one vector compare instruction.
+
+    if (Subtarget.hasAltivec() && canConvertToVcmpequb(LHS, RHS)) {
+      SDLoc DL(N);
+      SelectionDAG &DAG = DCI.DAG;
+      auto *LA = dyn_cast<LoadSDNode>(LHS);
+      auto *LB = dyn_cast<LoadSDNode>(RHS);
+
+      assert((LA && LB) && "LA and LB must be LoadSDNode");
----------------
lei137 wrote:

This check is already part of `canConvertToVcmpequb()`.

I think it would be better if we can move the whole thing into that function vs just the ck it self.  That way it's easier to extend this function as per Roland's comment.

So basically here, we would just see something like this:
```
if (Subtarget.hasAltivec()) {
  SDValue ConvertToVCMPEQUB = canConvertToVcmpequb(LHS, RHS);
  if (ConvertToVCMPEQUB)
	  return ConvertToVCMPEQUB;
}
```

https://github.com/llvm/llvm-project/pull/158657