[llvm] [Doc][AMDGPU] Document the waitcnts required before SCOPE_SYS stores on GFX12 (PR #156424)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 4 01:01:28 PDT 2025


================
@@ -14510,6 +14510,14 @@ For GFX12:
 * A memory attached last level (MALL) cache exists for GPU memory.
   The MALL cache is fully coherent with GPU memory and has no impact on system
   coherence. All agents (GPU and CPU) access GPU memory through the MALL cache.
+* The wait instructions below must be added before any ``SCOPE_SYS`` store in
+  order for the store to remain in order with previous memory operations.
+
+  * ``s_wait_loadcnt 0x0``
+  * ``s_wait_storecnt 0x0``
+  * ``s_wait_kmcnt 0x0``
+  * ``s_wait_samplecnt 0x0``
+  * ``s_wait_bvhcnt 0x0``
----------------
Pierre-vh wrote:

DScnt is indeed LDS.
The reordering here can, AFAIK, only occur between two system scope operations. The reordering case happens somewhere after L2.
So we need to wait for any operations that could be at that level. LDS ops aren't one of them because they can't leave the workgroup under any scenario.

https://github.com/llvm/llvm-project/pull/156424


More information about the llvm-commits mailing list