[llvm] [AMDGPU] Add DS loop waitcnt optimization for GFX12+ (PR #172728)

Tue Dec 23 20:10:26 PST 2025

================
@@ -2715,51 +2732,105 @@ bool SIInsertWaitcnts::isVMEMOrFlatVMEM(const MachineInstr &MI) const {
   return SIInstrInfo::isVMEM(MI);
 }
 
-// Return true if it is better to flush the vmcnt counter in the preheader of
-// the given loop. We currently decide to flush in two situations:
+bool SIInsertWaitcnts::isDSRead(const MachineInstr &MI) const {
+  return SIInstrInfo::isDS(MI) && MI.mayLoad() && !MI.mayStore();
+}
+
+// Check if instruction may store to LDS (including DS stores, atomics,
+// FLAT instructions that may access LDS, and LDS DMA).
+bool SIInsertWaitcnts::mayStoreLDS(const MachineInstr &MI) const {
----------------
ssahasra wrote:

I think the intended meaning here is to say that "MI is an instruction that increments DS_CNT". This is a subset of instructions that actually write to LDS. For example, DMA operations do not increment DS_CNT. They are ordered using LOAD_CNT on pre-GFX12, and using ASYNC_CNT on GFX12-plus.

https://github.com/llvm/llvm-project/pull/172728