[llvm] [AMDGPU] Insert waitcnt for non-global fence release in GFX12 (PR #159282)

Fabian Ritter via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 22 01:59:52 PDT 2025


================
@@ -2522,8 +2522,7 @@ bool SIGfx12CacheControl::insertRelease(MachineBasicBlock::iterator &MI,
   // sequentially consistent, and no other thread can access scratch
   // memory.
 
-  // Other address spaces do not have a cache.
-  if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) == SIAtomicAddrSpace::NONE)
+  if (AddrSpace == SIAtomicAddrSpace::SCRATCH)
     return false;
 
----------------
ritter-x2a wrote:

> And maybe we can also have insertWB(), so that it's clear what is happening at a release?

Feel free to overrule me here, but I don't think we gain much from introducing `insertWB`. I find that logic only used in three `insertRelease` implementations (gfx90a, gfx940, and gfx12), where it does something slightly different in each case, so there isn't much duplication to get rid off.
We could turn the `insertRelease` implementations for gfx90a, gfx940, and gfx12 into `insertWB(...); insertWait(...);` (for the others it seems to be just `insertWait(...);`), but that would only move the WB code from `SIGfx*CacheControl::insertRelease` to the new `SIGfx*CacheControl::insertWB`.

https://github.com/llvm/llvm-project/pull/159282


More information about the llvm-commits mailing list