[llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 16 04:02:31 PDT 2023


jayfoad wrote:

> A feature request asks for this because it would make debugging memory problems easier.

OK, I guess I can see that it might be useful to put the s_waitcnt immediately after the thing that "generates" the wait. `-amdgpu-waitcnt-forcezero` is different because it still puts the s_waitcnt before the thing that "consumes" the wait, but it forces the waitcnt value to be 0 instead of something higher.

I would prefer it if your implementation was a bit more integrated with the rest of the pass, instead of just adding a separate pass over all instructions. You might be able to do this by adding a call to `generateWaitcnt` near the end of `insertWaitcntInBlock`, after the call to `updateEventWaitcntAfter`. One advantage of doing it this way is that it would work for all wait counters including things like EXP_CNT which apply to non-memory instructions.

https://github.com/llvm/llvm-project/pull/68932


More information about the llvm-commits mailing list