[clang] [llvm] [AMDGPU] Implement Waitcnt Expansion for Profiling (PR #169345)

Pankaj Dwivedi via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 24 10:46:59 PST 2025


PankajDwivedi-25 wrote:

> > > Why would you restrict this to "non-zero counter values"?
> > 
> > 
> > When a waitcnt already has a zero counter value expanding it would just generate another waitcnt(0), which provides no additional profiling granularity. If you believe there's a use case for expanding waitcnt(0), I'd be happy to discuss it.
> 
> The requirement is that instead of emitting e.g. `s_waitcnt vmcnt(2)` you should emit e.g.:
> 
> ```
> s_waitcnt vmcnt(4)
> s_waitcnt vmcnt(3)
> s_waitcnt vmcnt(2)
> ```
> 
> The starting value "4" here is assuming that SIInsertWaitcnts already knows that the upper bound on this counter's value is 5, so 4 is the highest value you can wait for that will have any effect.
> 
> Similarly instead of `s_waitcnt vmcnt(0)` you should emit:
> 
> ```
> s_waitcnt vmcnt(4)
> s_waitcnt vmcnt(3)
> s_waitcnt vmcnt(2)
> s_waitcnt vmcnt(1)
> s_waitcnt vmcnt(0)
> ```

looks like outstanding value is always 0 in these cases.

https://github.com/llvm/llvm-project/pull/169345


More information about the llvm-commits mailing list