[llvm] [SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (PR #150438)

Jeffrey Byrnes via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 7 11:19:01 PDT 2025


================
@@ -780,6 +795,80 @@ Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
   return NewBO;
 }
 
+/// Analyze XOR instruction to extract disjoint constant bits for address
+/// folding
+///
+/// This function identifies bits in an XOR constant operand that are disjoint
+/// from the base operand's known set bits. For these disjoint bits, XOR behaves
+/// identically to addition, allowing us to extract them as constant offsets
+/// that can be folded into addressing modes.
+///
+/// Transformation: `Base ^ Const` becomes `(Base ^ NonDisjointBits) +
+/// DisjointBits` where DisjointBits = Const & KnownZeros(Base)
+///
+/// Example with ptr having known-zero low bit:
+///   Original: `xor %ptr, 3`    ; 3 = 0b11
+///   Analysis: DisjointBits = 3 & KnownZeros(%ptr) = 0b11 & 0b01 = 0b01
+///   Result:   `(xor %ptr, 2) + 1` where 1 can be folded into address mode
+///
+/// \param XorInst The XOR binary operator to analyze
+/// \return APInt containing the disjoint bits that can be extracted as offset,
+///         or zero if no disjoint bits exist
+APInt ConstantOffsetExtractor::extractDisjointBitsFromXor(
+    BinaryOperator *XorInst) {
+  assert(XorInst && XorInst->getOpcode() == Instruction::Xor &&
+         "Expected XOR instruction");
+
+  const unsigned BitWidth = XorInst->getType()->getScalarSizeInBits();
+  Value *BaseOperand;
+  ConstantInt *XorConstant;
+
+  // Match pattern: xor BaseOperand, Constant.
+  if (!match(XorInst, m_Xor(m_Value(BaseOperand), m_ConstantInt(XorConstant))))
+    return APInt::getZero(BitWidth);
+
+  // Compute known bits for the base operand.
+  const SimplifyQuery SQ(DL);
+  const KnownBits BaseKnownBits = computeKnownBits(BaseOperand, SQ);
+  const APInt &ConstantValue = XorConstant->getValue();
+
+  // Identify disjoint bits: constant bits that are known zero in base.
+  const APInt DisjointBits = ConstantValue & BaseKnownBits.Zero;
+
+  // Early exit if no disjoint bits found.
+  if (DisjointBits.isZero())
+    return APInt::getZero(BitWidth);
+
+  // Compute the remaining non-disjoint bits that stay in the XOR.
+  const APInt NonDisjointBits = ConstantValue & ~DisjointBits;
+
+  // Add non-disjoint bits to user chain and return.
+  auto addToUserChainAndReturn = [&]() -> APInt {
+    UserChain.push_back(ConstantInt::get(XorInst->getType(), NonDisjointBits));
+    return DisjointBits;
+  };
+
+  // Handle recursive extraction for binary operators.
+  auto *BO = dyn_cast<BinaryOperator>(BaseOperand);
+  if (!BO)
+    return addToUserChainAndReturn();
+
+  APInt ConstantOffset = find(BO, /*SignExtended=*/false,
+                              /*ZeroExtended=*/false, /*NonNegative=*/false);
+
+  // Add to chain and return if no further constant extraction possible.
+  if (ConstantOffset.isZero())
+    return addToUserChainAndReturn();
+
+  // Check for conflicts between extracted offset and disjoint bits
+  // (A binop B xor C) is not always equivalent with (A xor C binop B)
+  // These cases might already be optimized out by instruction combine
+  if (!(ConstantOffset & DisjointBits).isZero())
+    return APInt::getZero(BitWidth);
+
+  return ConstantOffset;
----------------
jrbyrnes wrote:

I'm seeing some issues with the following test case:

```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
  %2 = select i1 %0, i32 0, i32 512
  %4 = add i32 %2, 34
  %5 = xor i32 %4, 33
  %7 = getelementptr i8, ptr addrspace(3) %1, i32 %5
  %9 = load <8 x half>, ptr addrspace(3) %7, align 16
  store <8 x half> %9, ptr addrspace(3) %1, align 16
  ret void
}
```

This will extract the constant 34 operand from the add through the xor operand and use this as the GEP offset.

->

```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
  %2 = select i1 %0, i32 0, i32 512
  %3 = xor i32 %2, 33
  %4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
  %5 = getelementptr i8, ptr addrspace(3) %4, i32 34
  %6 = load <8 x half>, ptr addrspace(3) %5, align 16
  store <8 x half> %6, ptr addrspace(3) %1, align 16
  ret void
}

```

We can't do this extraction because 34 is not disjoint with the xor constant operand 33.



In order to preserve semantics for this example, we would need to subtract the non-disjoint bits from the xor constant operand and the value that is folded into the gep


->

```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
  %2 = select i1 %0, i32 0, i32 512
  %3 = xor i32 %2, 1
  %4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
  %5 = getelementptr i8, ptr addrspace(3) %4, i32 2
  %6 = load <8 x half>, ptr addrspace(3) %5, align 16
  store <8 x half> %6, ptr addrspace(3) %1, align 16
  ret void
}
```

While we can extract some principles for how this should work from this simple example, I am concerned that this may get more complicated with more complex chains in the base operand. I think it's best if we save this for a followup PR.

For this PR, I think we should do a simple extract: if we find an xor with cosntant operand and there are disjoint bits between the base and constant, extract the disjoint bits from the constant.

https://github.com/llvm/llvm-project/pull/150438


More information about the llvm-commits mailing list