[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
Jun Wang via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 26 17:15:50 PDT 2024
================
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
#endif
+ if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ AMDGPU::Waitcnt Wait;
+ if (ST->hasExtendedWaitCounts())
+ Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0);
+ else
+ Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
+
+ if (!Inst.mayStore())
+ Wait.StoreCnt = ~0u;
----------------
jwanggit86 wrote:
Code updated as suggested. Testfile includes case for both atomic-with-ret and atomic-no-ret. However, for the following case, even though `ds_add_u32` is atomic-no-ret, the Waitcnt for StoreCnt is set to ~0u after the call of `ScoreBrackets.simplifyWaitcnt(Wait)`. Therefore, no s_waitcnt for the StoreCnt is generated after the `ds_add_u32`.
```
define amdgpu_kernel void @atomic_add_local(ptr addrspace(3) %local) {
%unused = atomicrmw volatile add ptr addrspace(3) %local, i32 5 seq_cst
ret void
}
```
The code for GFX1100 is:
```
; GFX11: ds_add_u32 v0, v1
; GFX11-NEXT: s_waitcnt lgkmcnt(0)
; GFX11-NEXT: buffer_gl0_inv
```
Pls let me know if this looks correct.
https://github.com/llvm/llvm-project/pull/79236
More information about the llvm-commits
mailing list