[llvm] [AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled (PR #144968)
Shilei Tian via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 19 19:16:29 PDT 2025
================
@@ -66,7 +66,23 @@ void GCNRegPressure::inc(unsigned Reg,
Value[TupleIdx] += Sign * TRI->getRegClassWeight(RC).RegWeight;
}
// Pressure scales with number of new registers covered by the new mask.
- Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+ // Note that, when true16 is enabled, we can no longer use the following
+ // code to calculate the difference of number of 32-bit registers between
+ // the two mask:
+ //
+ // Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+ //
+ // The reason is, the new mask `~PrevMask & NewMask` doesn't treat a 16-bit
+ // register use as a whole 32-bit register use.
+ //
+ // Let's take a look at an example. Assume PrevMask = 0b0010, and NewMask =
+ // 0b1111. The difference in this case should be 1, because even though
+ // PrevMask only uses half of a 32-bit register, we still need to count it
+ // as a whole. However, `~PrevMask & NewMask` gives us 0b1101, and then
+ // `getNumCoveredRegs` will return 2 in this case, which can cause integer
+ // overflow if Sign = -1.
+ Sign *= SIRegisterInfo::getNumCoveredRegs(NewMask) -
+ SIRegisterInfo::getNumCoveredRegs(PrevMask);
----------------
shiltian wrote:
Basically we will need to convert any pair of bits in a mask to `0b11` if it is `0b01` or `0b10`. After that, we can continue to use `~PrevMask & NewMask` and then call `getNumCoveredRegs` once (popcount once as well). However, I don't know how to do the preprocessing via bit manipulation. Any suggestions?
https://github.com/llvm/llvm-project/pull/144968
More information about the llvm-commits
mailing list