[llvm] [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (PR #162077)

Fri Nov 28 02:14:25 PST 2025

================
@@ -733,9 +728,24 @@ class WaitcntBrackets {
     unsigned MyShift;
     unsigned OtherShift;
   };
+
+  void determineWaitForScore(InstCounterType T, unsigned Score,
+                             AMDGPU::Waitcnt &Wait) const;
+
   static bool mergeScore(const MergeInfo &M, unsigned &Score,
                          unsigned OtherScore);
 
+  iterator_range<MCRegUnitIterator> regunits(MCPhysReg Reg) const {
+    assert(Reg != AMDGPU::SCC && "Shouldn't be used on SCC");
+    const TargetRegisterClass *RC = Context->TRI->getPhysRegBaseClass(Reg);
+    unsigned Size = Context->TRI->getRegSizeInBits(*RC);
+    if (!Context->TRI->isInAllocatableClass(Reg))
+      return {{}, {}};
----------------
Pierre-vh wrote:

I ported this from the old implementation, and it seems necessary because if I remove that condition, 1k tests crash.

I think we cannot use `getRegSizeInBits` on an non-allocatable register; the debugger pointed to that.
The call was before the condition though, which is surprising. I suspect the optimizer moved it lower and prevented the crash when the condition succeeded. I moved the check higher just in case.

https://github.com/llvm/llvm-project/pull/162077