[llvm] [SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (PR #150438)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 7 11:19:01 PDT 2025
================
@@ -780,6 +795,80 @@ Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
return NewBO;
}
+/// Analyze XOR instruction to extract disjoint constant bits for address
+/// folding
+///
+/// This function identifies bits in an XOR constant operand that are disjoint
+/// from the base operand's known set bits. For these disjoint bits, XOR behaves
+/// identically to addition, allowing us to extract them as constant offsets
+/// that can be folded into addressing modes.
+///
+/// Transformation: `Base ^ Const` becomes `(Base ^ NonDisjointBits) +
+/// DisjointBits` where DisjointBits = Const & KnownZeros(Base)
+///
+/// Example with ptr having known-zero low bit:
+/// Original: `xor %ptr, 3` ; 3 = 0b11
+/// Analysis: DisjointBits = 3 & KnownZeros(%ptr) = 0b11 & 0b01 = 0b01
+/// Result: `(xor %ptr, 2) + 1` where 1 can be folded into address mode
+///
+/// \param XorInst The XOR binary operator to analyze
+/// \return APInt containing the disjoint bits that can be extracted as offset,
+/// or zero if no disjoint bits exist
+APInt ConstantOffsetExtractor::extractDisjointBitsFromXor(
+ BinaryOperator *XorInst) {
+ assert(XorInst && XorInst->getOpcode() == Instruction::Xor &&
+ "Expected XOR instruction");
+
+ const unsigned BitWidth = XorInst->getType()->getScalarSizeInBits();
+ Value *BaseOperand;
+ ConstantInt *XorConstant;
+
+ // Match pattern: xor BaseOperand, Constant.
+ if (!match(XorInst, m_Xor(m_Value(BaseOperand), m_ConstantInt(XorConstant))))
+ return APInt::getZero(BitWidth);
+
+ // Compute known bits for the base operand.
+ const SimplifyQuery SQ(DL);
+ const KnownBits BaseKnownBits = computeKnownBits(BaseOperand, SQ);
+ const APInt &ConstantValue = XorConstant->getValue();
+
+ // Identify disjoint bits: constant bits that are known zero in base.
+ const APInt DisjointBits = ConstantValue & BaseKnownBits.Zero;
+
+ // Early exit if no disjoint bits found.
+ if (DisjointBits.isZero())
+ return APInt::getZero(BitWidth);
+
+ // Compute the remaining non-disjoint bits that stay in the XOR.
+ const APInt NonDisjointBits = ConstantValue & ~DisjointBits;
+
+ // Add non-disjoint bits to user chain and return.
+ auto addToUserChainAndReturn = [&]() -> APInt {
+ UserChain.push_back(ConstantInt::get(XorInst->getType(), NonDisjointBits));
+ return DisjointBits;
+ };
+
+ // Handle recursive extraction for binary operators.
+ auto *BO = dyn_cast<BinaryOperator>(BaseOperand);
+ if (!BO)
+ return addToUserChainAndReturn();
+
+ APInt ConstantOffset = find(BO, /*SignExtended=*/false,
+ /*ZeroExtended=*/false, /*NonNegative=*/false);
+
+ // Add to chain and return if no further constant extraction possible.
+ if (ConstantOffset.isZero())
+ return addToUserChainAndReturn();
+
+ // Check for conflicts between extracted offset and disjoint bits
+ // (A binop B xor C) is not always equivalent with (A xor C binop B)
+ // These cases might already be optimized out by instruction combine
+ if (!(ConstantOffset & DisjointBits).isZero())
+ return APInt::getZero(BitWidth);
+
+ return ConstantOffset;
----------------
jrbyrnes wrote:
I'm seeing some issues with the following test case:
```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
%2 = select i1 %0, i32 0, i32 512
%4 = add i32 %2, 34
%5 = xor i32 %4, 33
%7 = getelementptr i8, ptr addrspace(3) %1, i32 %5
%9 = load <8 x half>, ptr addrspace(3) %7, align 16
store <8 x half> %9, ptr addrspace(3) %1, align 16
ret void
}
```
This will extract the constant 34 operand from the add through the xor operand and use this as the GEP offset.
->
```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
%2 = select i1 %0, i32 0, i32 512
%3 = xor i32 %2, 33
%4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
%5 = getelementptr i8, ptr addrspace(3) %4, i32 34
%6 = load <8 x half>, ptr addrspace(3) %5, align 16
store <8 x half> %6, ptr addrspace(3) %1, align 16
ret void
}
```
We can't do this extraction because 34 is not disjoint with the xor constant operand 33.
In order to preserve semantics for this example, we would need to subtract the non-disjoint bits from the xor constant operand and the value that is folded into the gep
->
```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
%2 = select i1 %0, i32 0, i32 512
%3 = xor i32 %2, 1
%4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
%5 = getelementptr i8, ptr addrspace(3) %4, i32 2
%6 = load <8 x half>, ptr addrspace(3) %5, align 16
store <8 x half> %6, ptr addrspace(3) %1, align 16
ret void
}
```
While we can extract some principles for how this should work from this simple example, I am concerned that this may get more complicated with more complex chains in the base operand. I think it's best if we save this for a followup PR.
For this PR, I think we should do a simple extract: if we find an xor with cosntant operand and there are disjoint bits between the base and constant, extract the disjoint bits from the constant.
https://github.com/llvm/llvm-project/pull/150438
More information about the llvm-commits
mailing list