[llvm] [AMDGPU] Simplify, fix and improve known bits for mbcnt (PR #104768)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 19 06:25:53 PDT 2024
================
@@ -15758,16 +15758,12 @@ void SITargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
case Intrinsic::amdgcn_mbcnt_hi: {
const GCNSubtarget &ST =
DAG.getMachineFunction().getSubtarget<GCNSubtarget>();
- // These return at most the (wavefront size - 1) + src1
- // As long as src1 is an immediate we can calc known bits
- KnownBits Src1Known = DAG.computeKnownBits(Op.getOperand(2), Depth + 1);
- unsigned Src1ValBits = Src1Known.countMaxActiveBits();
- unsigned MaxActiveBits = std::max(Src1ValBits, ST.getWavefrontSizeLog2());
- // Cater for potential carry
- MaxActiveBits += Src1ValBits ? 1 : 0;
- unsigned Size = Op.getValueType().getSizeInBits();
- if (MaxActiveBits < Size)
- Known.Zero.setHighBits(Size - MaxActiveBits);
+ // Wave64 mbcnt_lo returns at most 32 + src1. Otherwise these return at
+ // most 31 + src1.
----------------
jayfoad wrote:
> We fold mbcnt_hi out in wave32 in instcombine, so not sure how much use it is to handle here. But might as well for completeness.
I'm handling:
- mbcnt_lo in wave32 (max value is 31)
- mbcnt_lo in wave64 (max value is 32)
- mbcnt_hi (max value is 31 regardless of wave size)
None of this is redundant.
> Does this actually depend on the contents of exec_hi, or is it just ignored for wave32?
The result of mbcnt_lo does not depend on exec_hi (in either wave size). The thing is, it can only return a value of 32 in the upper 32 lanes, which don't exist in wave32 mode. The lower 32 lanes get a max value of 31.
https://github.com/llvm/llvm-project/pull/104768
More information about the llvm-commits
mailing list