[llvm] [AMDGPU] Optimize LDS DMA soft waitcnt (PR #138802)

Wed Jun 18 01:33:43 PDT 2025

================
@@ -1608,6 +1608,12 @@ let OtherPredicates = [HasImageInsts] in {
   def S_WAIT_DSCNT_soft : SOPP_Pseudo <"s_soft_wait_dscnt", (ins s16imm:$simm16), "$simm16">;
   def S_WAIT_KMCNT_soft : SOPP_Pseudo <"s_soft_wait_kmcnt", (ins s16imm:$simm16), "$simm16">;
 }
+// Soft waitcnt for direct loads to LDS from global memory. These waits may be
+// relaxed or removed entirely based on current in-flight memory operations
+// and their relation to these direct LDS loads. For example, if global loads
+// to LDS are mixed with global loads not writing to LDS, a wait may only be
+// necessary for the LDS-writing loads to synchronize with other LDS operations.
+def S_WAITCNT_DIRECT_LDS_LOAD_soft : SOPP_Pseudo <"s_soft_waitcnt" , (ins SWaitCnt:$simm16), "$simm16">;
----------------
Pierre-vh wrote:

This instruction is named `S_WAITCNT_DIRECT_LDS_LOAD_soft` but we don't have a `S_WAITCNT_DIRECT_LDS_LOAD`. I don't think it's right to call this a "soft" waitcnt. It should have another name, IMO.

https://github.com/llvm/llvm-project/pull/138802