[llvm] [PowerPC] Implement a more efficient memcmp in cases where the length is known. (PR #158657)
zhijian lin via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 22 08:30:45 PDT 2025
================
@@ -15556,6 +15556,63 @@ SDValue PPCTargetLowering::combineSetCC(SDNode *N,
SDValue Add = DAG.getNode(ISD::ADD, DL, OpVT, LHS, RHS.getOperand(1));
return DAG.getSetCC(DL, VT, Add, DAG.getConstant(0, DL, OpVT), CC);
}
+ if (Subtarget.hasVSX()) {
+ if (LHS.getOpcode() == ISD::LOAD && RHS.getOpcode() == ISD::LOAD &&
+ LHS.hasOneUse() && RHS.hasOneUse() &&
+ LHS.getValueType() == MVT::i128 && RHS.getValueType() == MVT::i128) {
----------------
diggerlin wrote:
in the pass expand-memcmp
```
%bcmp = tail call i32 @bcmp(ptr noundef nonnull dereferenceable(16) %a, ptr noundef nonnull dereferenceable(16) %b, i64 16)
%cmp = icmp eq i32 %bcmp, 0
%conv = zext i1 %cmp to i32
ret i32 %conv
```
to
```
%0 = load i128, ptr %a, align 1
%1 = load i128, ptr %b, align 1
%2 = icmp ne i128 %0, %1
%3 = zext i1 %2 to i32
%cmp = icmp eq i32 %3, 0
%conv = zext i1 %cmp to i32
ret i32 %conv
```
but in original code, the `load i128, ptr %a, align 1` is lowered to
```
t27: i64,ch = load<(load (s64) from %ir.a, align 1)> t0, t2, undef:i64
t32: i64,ch = load<(load (s64) from %ir.b, align 1)> t0, t4, undef:i64
```
in 64-bit mode, it is not efficient with two `ld` instruction in 64-bit mode or four `lwz` in 32-bit mode.
we want to i128 to be converted to vector load. so there is type restriction.
https://github.com/llvm/llvm-project/pull/158657
More information about the llvm-commits
mailing list