[llvm] [AMDGPU] Simplify, fix and improve known bits for mbcnt (PR #104768)

Mon Aug 19 06:25:53 PDT 2024

================
@@ -15758,16 +15758,12 @@ void SITargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
     case Intrinsic::amdgcn_mbcnt_hi: {
       const GCNSubtarget &ST =
           DAG.getMachineFunction().getSubtarget<GCNSubtarget>();
-      // These return at most the (wavefront size - 1) + src1
-      // As long as src1 is an immediate we can calc known bits
-      KnownBits Src1Known = DAG.computeKnownBits(Op.getOperand(2), Depth + 1);
-      unsigned Src1ValBits = Src1Known.countMaxActiveBits();
-      unsigned MaxActiveBits = std::max(Src1ValBits, ST.getWavefrontSizeLog2());
-      // Cater for potential carry
-      MaxActiveBits += Src1ValBits ? 1 : 0;
-      unsigned Size = Op.getValueType().getSizeInBits();
-      if (MaxActiveBits < Size)
-        Known.Zero.setHighBits(Size - MaxActiveBits);
+      // Wave64 mbcnt_lo returns at most 32 + src1. Otherwise these return at
+      // most 31 + src1.
----------------
jayfoad wrote:

> We fold mbcnt_hi out in wave32 in instcombine, so not sure how much use it is to handle here. But might as well for completeness.

I'm handling:
- mbcnt_lo in wave32 (max value is 31)
- mbcnt_lo in wave64 (max value is 32)
- mbcnt_hi (max value is 31 regardless of wave size)

None of this is redundant.

> Does this actually depend on the contents of exec_hi, or is it just ignored for wave32?

The result of mbcnt_lo does not depend on exec_hi (in either wave size). The thing is, it can only return a value of 32 in the upper 32 lanes, which don't exist in wave32 mode. The lower 32 lanes get a max value of 31.

https://github.com/llvm/llvm-project/pull/104768