[llvm-branch-commits] [llvm] [AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (PR #146075)

Mon Jun 30 07:09:55 PDT 2025

================
@@ -15136,6 +15136,41 @@ SDValue SITargetLowering::performPtrAddCombine(SDNode *N,
       return Folded;
   }
 
+  // Transform (ptradd a, b) -> (or disjoint a, b) if it is equivalent and if
+  // that transformation can't block an offset folding at any use of the ptradd.
+  // This should be done late, after legalization, so that it doesn't block
+  // other ptradd combines that could enable more offset folding.
+  bool HasIntermediateAssertAlign =
+      N0->getOpcode() == ISD::AssertAlign && N0->getOperand(0)->isAnyAdd();
+  // This is a hack to work around an ordering problem for DAGs like this:
+  //   (ptradd (AssertAlign (ptradd p, c1), k), c2)
+  // If the outer ptradd is handled first by the DAGCombiner, it can be
+  // transformed into a disjoint or. Then, when the generic AssertAlign combine
+  // pushes the AssertAlign through the inner ptradd, it's too late for the
+  // ptradd reassociation to trigger.
+  if (!DCI.isBeforeLegalizeOps() && !HasIntermediateAssertAlign &&
+      DAG.haveNoCommonBitsSet(N0, N1)) {
+    bool TransformCanBreakAddrMode = any_of(N->users(), [&](SDNode *User) {
+      if (auto *LoadStore = dyn_cast<MemSDNode>(User);
+          LoadStore && LoadStore->getBasePtr().getNode() == N) {
+        unsigned AS = LoadStore->getAddressSpace();
+        // Currently, we only really need ptradds to fold offsets into flat
+        // memory instructions.
+        if (AS != AMDGPUAS::FLAT_ADDRESS)
+          return false;
+        TargetLoweringBase::AddrMode AM;
+        AM.HasBaseReg = true;
+        EVT VT = LoadStore->getMemoryVT();
+        Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+        return isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS);
+      }
+      return false;
----------------
ritter-x2a wrote:

I'm not sure if every backend that could want to use ptradd nodes would want to transform them to ORs. However, there is probably not much point in developing for hypothetical backends, so I moved it to the generic combines for now, behind the `canTransformPtrArithOutOfBounds()` check (I also fixed it to actually check the intended addressing mode).

Dropping the target-specific `AS != AMDGPUAS::FLAT_ADDRESS` check affects the generated code in two lit tests ([identical-subrange-spill-infloop.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-517b7174ca71caeed2dd13ec440ee963e4db61f01911ff1cbc86ab0e60f16721) and [store-weird-sizes.ll](https://github.com/llvm/llvm-project/pull/146076/files#diff-32010dfaf8188291719044adb5c6e927b17fe3e3657e0f27ebe2e2a10a020889)).
But, looking more into it, I find that
- the new code for `identical-subrange-spill-infloop.ll` could be argued to be an improvement (offsets for SMRD instructions are folded where they weren't before) and
- the problem for `store-weird-sizes.ll` seems to be that `SITargetLowering::isLegalAddressingMode` is overly optimistic when asked if "register + constant offset" is a valid addressing mode for AS4 on architectures predating `global_*` instructions. So this should be fixed there. 

We already have a [generic combine](https://github.com/llvm/llvm-project/blob/629126ed44bd3ce5b6f476459c805be4e4e0c2ca/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L15173-L15196) that pushes the AssertAlign through the (ptr)add.
The handling here was necessary because that combine would only be applied after the outer PTRADD was already visited and combined into an OR. However, that doesn't seem to happen anymore in the tests when it's a generic combine, so I dropped this handling as well.

https://github.com/llvm/llvm-project/pull/146075