[llvm] [clang] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

Wed Nov 22 15:54:49 PST 2023

================
@@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
     }
 
     ++Iter;
+    if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+      auto Builder =
+          BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+              .addImm(0);
+      if (IsGFX10Plus) {
----------------
jwanggit86 wrote:

My understanding is that the feature request asks for a "s_waitcnt 0" to be *blindly* inserted after each and every memory instruction. Enabling the feature is at the user's discretion via a clang command-line option (disabled by default). The purpose of the feature is to help debug memory problems on GPUs that do not support precise memory. (Although someone, Tony I think, mentioned it could go beyond debugging). I'll send you the link for the feature request.

Based on that, the implementation doesn't check on GPU models, doesn't have model-dependent code (except the newly-added code for GFX10+), or differentiate loads from stores. I'll work with the requester to get the requirements straightened out.

https://github.com/llvm/llvm-project/pull/68932