[llvm] [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (PR #162077)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 28 02:14:25 PST 2025
================
@@ -733,9 +728,24 @@ class WaitcntBrackets {
unsigned MyShift;
unsigned OtherShift;
};
+
+ void determineWaitForScore(InstCounterType T, unsigned Score,
+ AMDGPU::Waitcnt &Wait) const;
+
static bool mergeScore(const MergeInfo &M, unsigned &Score,
unsigned OtherScore);
+ iterator_range<MCRegUnitIterator> regunits(MCPhysReg Reg) const {
+ assert(Reg != AMDGPU::SCC && "Shouldn't be used on SCC");
+ const TargetRegisterClass *RC = Context->TRI->getPhysRegBaseClass(Reg);
+ unsigned Size = Context->TRI->getRegSizeInBits(*RC);
+ if (!Context->TRI->isInAllocatableClass(Reg))
+ return {{}, {}};
----------------
Pierre-vh wrote:
I ported this from the old implementation, and it seems necessary because if I remove that condition, 1k tests crash.
I think we cannot use `getRegSizeInBits` on an non-allocatable register; the debugger pointed to that.
The call was before the condition though, which is surprising. I suspect the optimizer moved it lower and prevented the crash when the condition succeeded. I moved the check higher just in case.
https://github.com/llvm/llvm-project/pull/162077
More information about the llvm-commits
mailing list