[llvm] [AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled (PR #144968)

Fri Jun 20 05:09:54 PDT 2025

================
@@ -66,7 +66,23 @@ void GCNRegPressure::inc(unsigned Reg,
       Value[TupleIdx] += Sign * TRI->getRegClassWeight(RC).RegWeight;
     }
     // Pressure scales with number of new registers covered by the new mask.
-    Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+    // Note that, when true16 is enabled, we can no longer use the following
+    // code to calculate the difference of number of 32-bit registers between
+    // the two mask:
+    //
+    // Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+    //
+    // The reason is, the new mask `~PrevMask & NewMask` doesn't treat a 16-bit
+    // register use as a whole 32-bit register use.
+    //
+    // Let's take a look at an example. Assume PrevMask = 0b0010, and NewMask =
+    // 0b1111. The difference in this case should be 1, because even though
+    // PrevMask only uses half of a 32-bit register, we still need to count it
+    // as a whole. However, `~PrevMask & NewMask` gives us 0b1101, and then
+    // `getNumCoveredRegs` will return 2 in this case, which can cause integer
+    // overflow if Sign = -1.
+    Sign *= SIRegisterInfo::getNumCoveredRegs(NewMask) -
+            SIRegisterInfo::getNumCoveredRegs(PrevMask);
----------------
lucas-rami wrote:

Taking inspiration from `getNumCoveredRegs`, I think

```cpp
PrevMask | ((PrevMask & 0xAAAAAAAAAAAAAAAAULL) >> 1) | ((PrevMask & 0x5555555555555555ULL) << 1)
```

should work to transform `0b01`/`0b10` into `0b11`.

https://github.com/llvm/llvm-project/pull/144968