[llvm] [AMDGPU] Folding imm offset in more cases for scratch access (PR #70634)

Thu Nov 16 18:02:34 PST 2023

================
@@ -1146,13 +1146,57 @@ bool AMDGPUDAGToDAGISel::isDSOffset2Legal(SDValue Base, unsigned Offset0,
   return CurDAG->SignBitIsZero(Base);
 }
 
-bool AMDGPUDAGToDAGISel::isFlatScratchBaseLegal(SDValue Base,
-                                                uint64_t FlatVariant) const {
+// Whether we can infer the operands are non-negative if the result is
+// non-negative.
+static bool canInferNonNegativeOperands(SDValue Addr) {
+  return (Addr.getOpcode() == ISD::ADD &&
+          Addr->getFlags().hasNoUnsignedWrap()) ||
+         Addr->getOpcode() == ISD::OR;
+}
+
+// Check that the address value of flat scratch load/store being put into
+// SGPR/VGPR is legal (i.e. unsigned per hardware requirement). When \p
+// CheckBothOperands is set, we will check both operands in the instruction,
+// otherwise only check the first source operand.
+bool AMDGPUDAGToDAGISel::isFlatScratchBaseLegal(SDValue Addr,
+                                                uint64_t FlatVariant,
+                                                bool CheckBothOperands) const {
   if (FlatVariant != SIInstrFlags::FlatScratch)
     return true;
-  // When value in 32-bit Base can be negative calculate scratch offset using
-  // 32-bit add instruction, otherwise use Base(unsigned) + offset.
-  return CurDAG->SignBitIsZero(Base);
+
+  if (canInferNonNegativeOperands(Addr))
+    return true;
+
+  auto LHS = Addr.getOperand(0);
+  auto RHS = Addr.getOperand(1);
+  if (CheckBothOperands)
+    return CurDAG->SignBitIsZero(RHS) && CurDAG->SignBitIsZero(LHS);
+
+  // If the immediate offset is negative, the base address cannot also be
+  // negative. Although this is not strictly true in theory (in the case of
+  // wrap around), but given the range of legal immediate offset, this is true
+  // to get a final valid scratch address.
----------------
arsenm wrote:

I don't see how the range of the offset matters here, this just seems wrong. Is this just working around the fact that we don't have known bits from raw copies from SP? Can we teach computeKnownBits that SP is in range of the maximum frame index?

https://github.com/llvm/llvm-project/pull/70634