[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)
    Jay Foad via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Fri Mar 29 13:18:40 PDT 2024
    
    
  
================
@@ -2326,6 +2326,20 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
     }
 #endif
 
+    if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+      AMDGPU::Waitcnt Wait;
+      if (ST->hasExtendedWaitCounts())
+        Wait = AMDGPU::Waitcnt(0, 0, 0, 0, 0, 0, 0);
+      else
+        Wait = AMDGPU::Waitcnt(0, 0, 0, 0);
+
+      if (!Inst.mayStore())
+        Wait.StoreCnt = ~0u;
----------------
jayfoad wrote:
GFX10 introduced a separate counter for **VMEM** stores with the name VScnt. GFX12 just renamed it to STOREcnt. No architecture has a separate store counter for DS or SMEM. So `ds_add_u32 v0, v1` followed by `s_waitcnt lgkmcnt(0)` (pre-GFX12) or `s_wait_dscnt 0` (GFX12) is fine .
https://github.com/llvm/llvm-project/pull/79236
    
    
More information about the llvm-commits
mailing list