[llvm] [SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (PR #150438)

Fri Aug 1 08:06:50 PDT 2025

================
@@ -780,6 +795,69 @@ Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
   return NewBO;
 }
 
+/// Analyze XOR instruction to extract disjoint constant bits for address
+/// folding
+///
+/// This function identifies bits in an XOR constant operand that are disjoint
+/// from the base operand's known set bits. For these disjoint bits, XOR behaves
+/// identically to addition, allowing us to extract them as constant offsets
+/// that can be folded into addressing modes.
+///
+/// Transformation: `Base ^ Const` becomes `(Base ^ NonDisjointBits) +
+/// DisjointBits` where DisjointBits = Const & KnownZeros(Base)
+///
+/// Example with ptr having known-zero low bit:
+///   Original: `xor %ptr, 3`    ; 3 = 0b11
+///   Analysis: DisjointBits = 3 & KnownZeros(%ptr) = 0b11 & 0b01 = 0b01
+///   Result:   `(xor %ptr, 2) + 1` where 1 can be folded into address mode
+///
+/// \param XorInst The XOR binary operator to analyze
+/// \return APInt containing the disjoint bits that can be extracted as offset,
+///         or zero if no disjoint bits exist
+APInt ConstantOffsetExtractor::extractDisjointBitsFromXor(
+    BinaryOperator *XorInst) {
+  assert(XorInst && XorInst->getOpcode() == Instruction::Xor &&
+         "Expected XOR instruction");
+
+  const unsigned BitWidth = XorInst->getType()->getScalarSizeInBits();
+  Value *BaseOperand;
+  ConstantInt *XorConstant;
+
+  // Match pattern: xor BaseOperand, Constant.
+  if (!match(XorInst, m_Xor(m_Value(BaseOperand), m_ConstantInt(XorConstant))))
+    return APInt::getZero(BitWidth);
+
+  // Try to extract constant offset from the base operand recursively.
+  if (BinaryOperator *BO = dyn_cast<BinaryOperator>(BaseOperand)) {
+    APInt ConstantOffset = find(BO, /*SignExtended=*/false,
+                                /*ZeroExtended=*/false, /*NonNegative=*/false);
+    if (!ConstantOffset.isZero())
+      return ConstantOffset;
----------------
jrbyrnes wrote:

It's not a further optimization, its a requirement / correctness bug.

Please see and add this test:

```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
  %2 = select i1 %0, i32 0, i32 512
  %4 = add i32 %2, 32
  %5 = xor i32 %4, 32
  %7 = getelementptr i8, ptr addrspace(3) %1, i32 %5
  %9 = load <8 x half>, ptr addrspace(3) %7, align 16
  store <8 x half> %9, ptr addrspace(3) %1, align 16
  ret void
}
```

Whatever the value of %2, the value of %4 will have the 5th bit set (for adding 32). Then, the effect of xor 32 on this value is to unset that bit. Thus %5 is the same as %2. Let's assume %2 is 0, then %4 is 32 and %5 is 0. The gep gets the address %1 + 0.


Currently, this PR will produce:

define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
  %2 = select i1 %0, i32 0, i32 512
  %3 = xor i32 %2, 32
  %4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
  %5 = getelementptr i8, ptr addrspace(3) %4, i32 32
  %6 = load <8 x half>, ptr addrspace(3) %5, align 16
  store <8 x half> %6, ptr addrspace(3) %1, align 16
  ret void
}


Assuming %2 is 0, then %3 will be 32. Then, %4 is producing %1 + 32 and %5 is producing %4 + 32. The final address used for the load is %1 + 64 instead of %1 + 0. This is because when extracting a constant through an xor, we can only extract the disjoint bits.

https://github.com/llvm/llvm-project/pull/150438