[llvm] [AMDGPU] Add DS loop waitcnt optimization for GFX12+ (PR #172728)

Mon Dec 29 18:26:48 PST 2025

================
@@ -2715,51 +2732,105 @@ bool SIInsertWaitcnts::isVMEMOrFlatVMEM(const MachineInstr &MI) const {
   return SIInstrInfo::isVMEM(MI);
 }
 
-// Return true if it is better to flush the vmcnt counter in the preheader of
-// the given loop. We currently decide to flush in two situations:
+bool SIInsertWaitcnts::isDSRead(const MachineInstr &MI) const {
+  return SIInstrInfo::isDS(MI) && MI.mayLoad() && !MI.mayStore();
+}
+
+// Check if instruction may store to LDS (including DS stores, atomics,
+// FLAT instructions that may access LDS, and LDS DMA).
+bool SIInsertWaitcnts::mayStoreLDS(const MachineInstr &MI) const {
----------------
ssahasra wrote:

>From what I understand, the end result of this patch is to control whether we flush DS_CNT to zero. But for this, covering "LDS stores of any kind" is too wide. There are LDS stores that have nothing to do with DS_CNT. If a block contains only an async load to LDS, then there no point flushing DS_CNT because that instruction does not increment it. That's why I am saying, if your change is about DS_CNT, then check for DS_CNT ... it is not always the same as "store to LDS".

https://github.com/llvm/llvm-project/pull/172728