[PATCH] D137936: [AArch64] Optimize cmp chain when the result is tested for [in]equality with 0

Wed Nov 16 01:41:19 PST 2022

dmgreen added a reviewer: RKSimon.
dmgreen added a comment.

There is code in DAGCombiner::BackwardsPropagateMask that can propagate And's back to loads, and would usually handle patterns like this but it can't look through any_extends. It has seemed to be useful in the past though.
You could also imagine transforming `i64 and (any_extend(i32 x), mask)` into `i64 zext(i32 and(x, mask)` under AArch64, as we know the zext will be free. I think that would run into other problems though, as the zext between the And isn't handled for all the BFI cases. Without improving BFI at the same time it would lead to other regressions.

So I'm not sure either of those methods would be better than this, even if they are more general. I think it would be useful to add deliberate tests for this though if we can, especially for the edge cases.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11982
+//      (zext (and/or/xor (zextload x, zextload x)))
+SDValue DAGCombiner::CombineZExtLogicopDupExtLoad(SDNode *N) {
+  assert(N->getOpcode() == ISD::AND && "Unexpected opcode");
----------------
Why "Dup" in the name? Because there are two loads? When I see Dup I think of vector splats.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12013
+  if (!Load0->hasOneUse() || !Load1->hasOneUse() || MemVT0 != MemVT1 ||
+      !DAG.MaskedValueIsZero(N1, Mask))
+    return SDValue();
----------------
Does it not need to check other bits about the Mask?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12030
+
+  SDValue Logic = DAG.getNode(ISD::XOR, SDLoc(N), OrigVT, ExtLoad0, ExtLoad1);
+  return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, Logic);
----------------
Should XOR be Logicop.getOpcode()?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:20666
 //                     (trunc (srl $1 half-width))
-//                     (trunc (srl $1 (2 * half-width))) …)
+//                     (trunc (srl $1 (2 * half-width))) ...)
 // to (bitcast $1)
----------------
This can be done separately.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137936/new/

https://reviews.llvm.org/D137936