[llvm] [AMDGPU] Insert waitcnt for non-global fence release in GFX12 (PR #159282)
Fabian Ritter via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 22 00:11:33 PDT 2025
================
@@ -2522,8 +2522,7 @@ bool SIGfx12CacheControl::insertRelease(MachineBasicBlock::iterator &MI,
// sequentially consistent, and no other thread can access scratch
// memory.
- // Other address spaces do not have a cache.
- if ((AddrSpace & SIAtomicAddrSpace::GLOBAL) == SIAtomicAddrSpace::NONE)
+ if (AddrSpace == SIAtomicAddrSpace::SCRATCH)
return false;
----------------
ritter-x2a wrote:
Here, in `insertRelease`, waitcnts are inserted after this `global_wb` insertion code; the early return therefore also (wrongly) affected the waitcnt insertion.
In `insertAcquire`, the early return only guards the `global_inv` insertion while necessary waitcnts are inserted outside of `insertAcquire`, before it is called.
We could introduce a `bool canAffectGlobalAS(SIAtomicAddrSpace)` helper that encapsulates this condition to make it more recognizable that one case is negated.
https://github.com/llvm/llvm-project/pull/159282
More information about the llvm-commits
mailing list