[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

Pierre van Houtryve via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 1 03:08:40 PST 2024


================
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
     }
 #endif
 
+    if (ST->isPreciseMemoryEnabled()) {
+      AMDGPU::Waitcnt Wait;
+      if (WCG == &WCGPreGFX12)
+        Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
----------------
Pierre-vh wrote:

I was looking at https://github.com/ROCm/ROCm-CompilerSupport/issues/66 and it made me wonder, why do we have to emit all zeroes instead of just emitting what's in `ScoreBrackets`? Is there an advantage?

I'm wondering if this should just emit `ScoreBrackets`, then `+precise-memory` + `-amdgpu-waitcnt-forcezero` need to be used together achieve the behavior we have here?


https://github.com/llvm/llvm-project/pull/79236


More information about the cfe-commits mailing list