[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)
Jun Wang via cfe-commits
cfe-commits at lists.llvm.org
Wed Nov 22 15:54:49 PST 2023
================
@@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
}
++Iter;
+ if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+ auto Builder =
+ BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+ .addImm(0);
+ if (IsGFX10Plus) {
----------------
jwanggit86 wrote:
My understanding is that the feature request asks for a "s_waitcnt 0" to be *blindly* inserted after each and every memory instruction. Enabling the feature is at the user's discretion via a clang command-line option (disabled by default). The purpose of the feature is to help debug memory problems on GPUs that do not support precise memory. (Although someone, Tony I think, mentioned it could go beyond debugging). I'll send you the link for the feature request.
Based on that, the implementation doesn't check on GPU models, doesn't have model-dependent code (except the newly-added code for GFX10+), or differentiate loads from stores. I'll work with the requester to get the requirements straightened out.
https://github.com/llvm/llvm-project/pull/68932
More information about the cfe-commits
mailing list