[llvm] [AMDGPU] Optimize LDS DMA soft waitcnt (PR #138802)

Tue Jun 17 20:43:27 PDT 2025

================
@@ -1278,6 +1278,23 @@ bool WaitcntGeneratorPreGFX12::applyPreexistingWaitcnt(
     if (Opcode == AMDGPU::S_WAITCNT) {
       unsigned IEnc = II.getOperand(0).getImm();
       AMDGPU::Waitcnt OldWait = AMDGPU::decodeWaitcnt(IV, IEnc);
+
+      // These pseudo waitcnt instructions are only needed to synchronize DS
+      // operations with direct LDS loads that use vmcnt. We can safely relax
+      // them when no outstanding direct LDS loads exist, even if other vmcnt
+      // events are pending.
+      if (II.getOpcode() == AMDGPU::S_WAITCNT_DIRECT_LDS_LOAD_soft &&
----------------
kerbowa wrote:

In the waitcnt pass we are not really looking at fences and atomics directly. The idea is to rely on the "MemoryLegalizer" pass to insert soft waitcnt which can be optimized. If they are just regular waitcnt how can I differentiate from other "soft" vmcnt(0) that are added for reasons besides direct load to LDS?

https://github.com/llvm/llvm-project/pull/138802