[llvm] [PowerPC] Implement a more efficient memcmp in cases where the length is known. (PR #158657)

Mon Sep 22 08:30:45 PDT 2025

================
@@ -15556,6 +15556,63 @@ SDValue PPCTargetLowering::combineSetCC(SDNode *N,
       SDValue Add = DAG.getNode(ISD::ADD, DL, OpVT, LHS, RHS.getOperand(1));
       return DAG.getSetCC(DL, VT, Add, DAG.getConstant(0, DL, OpVT), CC);
     }
+    if (Subtarget.hasVSX()) {
+      if (LHS.getOpcode() == ISD::LOAD && RHS.getOpcode() == ISD::LOAD &&
+          LHS.hasOneUse() && RHS.hasOneUse() &&
+          LHS.getValueType() == MVT::i128 && RHS.getValueType() == MVT::i128) {
----------------
diggerlin wrote:

in the pass  expand-memcmp

```
 %bcmp = tail call i32 @bcmp(ptr noundef nonnull dereferenceable(16) %a, ptr noundef nonnull dereferenceable(16) %b, i64 16)
  %cmp = icmp eq i32 %bcmp, 0
  %conv = zext i1 %cmp to i32
  ret i32 %conv
```


to 

 ```
 %0 = load i128, ptr %a, align 1
  %1 = load i128, ptr %b, align 1
  %2 = icmp ne i128 %0, %1
  %3 = zext i1 %2 to i32
  %cmp = icmp eq i32 %3, 0
  %conv = zext i1 %cmp to i32
  ret i32 %conv
```

but in original code,  the `load i128, ptr %a, align 1` is  lowered to
 ```
t27: i64,ch = load<(load (s64) from %ir.a, align 1)> t0, t2, undef:i64
            t32: i64,ch = load<(load (s64) from %ir.b, align 1)> t0, t4, undef:i64
```
in 64-bit mode,  it is not efficient with two `ld` instruction in 64-bit mode or four `lwz` in 32-bit mode.

we want to i128 to be converted to vector load. so there is type restriction.

https://github.com/llvm/llvm-project/pull/158657