[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
Pierre van Houtryve via cfe-commits
cfe-commits at lists.llvm.org
Fri Mar 1 03:08:40 PST 2024
================
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
#endif
+ if (ST->isPreciseMemoryEnabled()) {
+ AMDGPU::Waitcnt Wait;
+ if (WCG == &WCGPreGFX12)
+ Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
----------------
Pierre-vh wrote:
I was looking at https://github.com/ROCm/ROCm-CompilerSupport/issues/66 and it made me wonder, why do we have to emit all zeroes instead of just emitting what's in `ScoreBrackets`? Is there an advantage?
I'm wondering if this should just emit `ScoreBrackets`, then `+precise-memory` + `-amdgpu-waitcnt-forcezero` need to be used together achieve the behavior we have here?
https://github.com/llvm/llvm-project/pull/79236
More information about the cfe-commits
mailing list