[llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

Thu Jul 10 01:17:16 PDT 2025

================
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1(
 ; GFX12-WGP-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
 ; GFX12-WGP-NEXT:    global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS
+; GFX12-WGP-NEXT:    s_wait_loadcnt 0x3f
----------------
ssahasra wrote:

> Longer term we should really just have a single waitcnt pseudo for the MemoryLegalizer that is target-independent, it'd fix issues like these if we had special sentinel values for different things.

Isn't that exactly what this change does? The S_WAITCNT_soft is the only pseudo we need, and ~0 is the sentinel that tells the wait count inserter to compute its own value.

The separation of concerns here is very simple. The memory legalizer should always emit wait counts to implement the memory model. The current implementation does an optimization where it skips the wait count at workgroup scope. Instead this change moves that decision to the wait count inserter, which uses far more information about the program than the legalizer does. It is not limited to just direct loads to LDS, although that is indeed the current motivation for shifting work from one place to another.

https://github.com/llvm/llvm-project/pull/147257