[llvm] [SeparateConstOffsetFromGEP] Decompose constant xor operand if possible (PR #150438)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 1 08:06:50 PDT 2025
================
@@ -780,6 +795,69 @@ Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
return NewBO;
}
+/// Analyze XOR instruction to extract disjoint constant bits for address
+/// folding
+///
+/// This function identifies bits in an XOR constant operand that are disjoint
+/// from the base operand's known set bits. For these disjoint bits, XOR behaves
+/// identically to addition, allowing us to extract them as constant offsets
+/// that can be folded into addressing modes.
+///
+/// Transformation: `Base ^ Const` becomes `(Base ^ NonDisjointBits) +
+/// DisjointBits` where DisjointBits = Const & KnownZeros(Base)
+///
+/// Example with ptr having known-zero low bit:
+/// Original: `xor %ptr, 3` ; 3 = 0b11
+/// Analysis: DisjointBits = 3 & KnownZeros(%ptr) = 0b11 & 0b01 = 0b01
+/// Result: `(xor %ptr, 2) + 1` where 1 can be folded into address mode
+///
+/// \param XorInst The XOR binary operator to analyze
+/// \return APInt containing the disjoint bits that can be extracted as offset,
+/// or zero if no disjoint bits exist
+APInt ConstantOffsetExtractor::extractDisjointBitsFromXor(
+ BinaryOperator *XorInst) {
+ assert(XorInst && XorInst->getOpcode() == Instruction::Xor &&
+ "Expected XOR instruction");
+
+ const unsigned BitWidth = XorInst->getType()->getScalarSizeInBits();
+ Value *BaseOperand;
+ ConstantInt *XorConstant;
+
+ // Match pattern: xor BaseOperand, Constant.
+ if (!match(XorInst, m_Xor(m_Value(BaseOperand), m_ConstantInt(XorConstant))))
+ return APInt::getZero(BitWidth);
+
+ // Try to extract constant offset from the base operand recursively.
+ if (BinaryOperator *BO = dyn_cast<BinaryOperator>(BaseOperand)) {
+ APInt ConstantOffset = find(BO, /*SignExtended=*/false,
+ /*ZeroExtended=*/false, /*NonNegative=*/false);
+ if (!ConstantOffset.isZero())
+ return ConstantOffset;
----------------
jrbyrnes wrote:
It's not a further optimization, its a requirement / correctness bug.
Please see and add this test:
```
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
%2 = select i1 %0, i32 0, i32 512
%4 = add i32 %2, 32
%5 = xor i32 %4, 32
%7 = getelementptr i8, ptr addrspace(3) %1, i32 %5
%9 = load <8 x half>, ptr addrspace(3) %7, align 16
store <8 x half> %9, ptr addrspace(3) %1, align 16
ret void
}
```
Whatever the value of %2, the value of %4 will have the 5th bit set (for adding 32). Then, the effect of xor 32 on this value is to unset that bit. Thus %5 is the same as %2. Let's assume %2 is 0, then %4 is 32 and %5 is 0. The gep gets the address %1 + 0.
Currently, this PR will produce:
define amdgpu_kernel void @test6(i1 %0, ptr addrspace(3) %1) {
entry:
%2 = select i1 %0, i32 0, i32 512
%3 = xor i32 %2, 32
%4 = getelementptr i8, ptr addrspace(3) %1, i32 %3
%5 = getelementptr i8, ptr addrspace(3) %4, i32 32
%6 = load <8 x half>, ptr addrspace(3) %5, align 16
store <8 x half> %6, ptr addrspace(3) %1, align 16
ret void
}
Assuming %2 is 0, then %3 will be 32. Then, %4 is producing %1 + 32 and %5 is producing %4 + 32. The final address used for the load is %1 + 64 instead of %1 + 0. This is because when extracting a constant through an xor, we can only extract the disjoint bits.
https://github.com/llvm/llvm-project/pull/150438
More information about the llvm-commits
mailing list