[llvm] (reland) [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077) (PR #171779)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 11 02:16:13 PST 2025


Pierre-vh wrote:

> I do not understand why this changes anything. The old code should have worked, because accessing VMem[RegID].Scores[T] implicitly created a default entry for VMem[RegID] if none existed before. I checked this when I reviewed the original patch!

Yes, but we only iterated using the keys from the "Other" map. If the map in the current object had more keys, we did not visit (call `mergeScore`) on them.

> Do you even have an unreduced reproduser, where this fix makes any difference at all to the behaviour?

Yes; I'd not have claimed to have a fix otherwise :)
When building the Blender Cycles source code from Blender 4.1 on gfx90a, a lot of waitcnts are missing/changed before/after the patch. For example we have `s_waitcnt vmcnt(6)` in some place instead of `vmcnt(0)`. This is how I debugged it.



https://github.com/llvm/llvm-project/pull/171779


More information about the llvm-commits mailing list