[PATCH] D137721: [AArch64] Optimize more memcmp when the result is tested for [in]equality with 0

Wed Nov 9 13:58:18 PST 2022

dmgreen added a comment.

Looks like a good patch. I like the way it checks for chains of ORs.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19533
+  // All the nodes must be either OR or XOR.
+  if (N->getOpcode() != ISD::OR && N->getOpcode() != ISD::XOR)
+    return false;
----------------
The check for OR could be moved below the check for XOR, saving XOR from being checked twice.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19548
+
+static SDValue performMemcmpCombine(SDNode *N,
+                                    TargetLowering::DAGCombinerInfo &DCI,
----------------
Could this be called something like performOrXorChainCombine. I'm not sure I love that name, but the transform is more general than just memcmp. It would be good to mention memcmp/bmp in the comments of the function perhaps, to explain the motivation.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19560
+  // cmp A0, A0; ccmp A0, B1, 0, eq; cmp inv(Cond) flag
+  if (!DCI.isBeforeLegalize() && VT.isScalarInteger() &&
+      (Cond == ISD::SETEQ || Cond == ISD::SETNE) && isNullConstant(RHS) &&
----------------
LLVM likes to exit early by inverting the condition. It can help reduce indentations.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19562
+      (Cond == ISD::SETEQ || Cond == ISD::SETNE) && isNullConstant(RHS) &&
+      LHS->getOpcode() == ISD::OR && isMemcmpPattern(LHS, WorkList)) {
+
----------------
Perhaps rename isMemcmpPattern to isOrXorChain too.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19572
+    SDValue CCmp;
+    for (unsigned i = 1; i < WorkList.size(); i++) {
+      std::tie(XOR0, XOR1) = WorkList[i];
----------------
i -> I

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:19653
+  // Try to perform the memcmp when the result is tested for [in]equality with 0
+  if (SDValue V = performMemcmpCombine(N, DCI, DAG))
+    return V;
----------------
Could this do:
```
if (!DCI.isBeforeLegalize())
  if (SDValue V = performMemcmpCombine(N, DAG))
    return V;
```
It should avoid needing to pass DCI through to performMemcmpCombine.

================
Comment at: llvm/test/CodeGen/AArch64/bcmp.ll:43
+; or (and (xor a, b), C1), (and (xor c, d), C2)
 define i1 @bcmp3(ptr %a, ptr %b) {
 ; CHECK-LABEL: bcmp3:
----------------
I think this might be fixed if performMemcmpCombine was called from lowerSETCC as well as from the combine. It looks like the issue is that it we do not call the performMemcmpCombine after the operands further up the tree have been simplified.

================
Comment at: llvm/test/CodeGen/AArch64/bcmp.ll:150
 ; CHECK-NEXT:    eor w9, w9, w10
 ; CHECK-NEXT:    and x9, x9, #0xff
 ; CHECK-NEXT:    eor x8, x8, x11
----------------
This And shouldn't be here - it should be pushed higher into the ldrb's. There is some code that already tried to push ands up into loads, but I've not looked into whether it could be extended to handle this too.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137721/new/

https://reviews.llvm.org/D137721