[llvm] [AMDGPU] Add DS loop wait optimization infrastructure (1/4) (PR #171942)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 02:37:02 PST 2025
================
@@ -2643,6 +2670,85 @@ bool SIInsertWaitcnts::isVMEMOrFlatVMEM(const MachineInstr &MI) const {
return SIInstrInfo::isVMEM(MI);
}
+//===----------------------------------------------------------------------===//
+// DS Loop Wait Optimization (GFX12+)
+//
+// This optimization relaxes DS wait counts in single-block loops that have
+// many DS loads and WMMA/MFMA instructions (typical GEMM kernels with software
+// pipelining). Instead of waiting for almost all DS loads to complete before
+// each WMMA, we analyze which specific loads feed each WMMA and wait only for
+// those to complete, allowing more overlap between memory and compute.
+//
+// Opportunity arises when the load ordering in the preheader block and
+// the load ordering at the end of the loop body, feeding the loaded data
+// to the next iteration, are not matched well (since their orderings are
+// not co-optimized)
+//===----------------------------------------------------------------------===//
+
+bool SIInsertWaitcnts::isEligibleForDSLoopOpt(MachineLoop *ML,
+ LoopDSWaitOptInfo &Info) const {
+ if (!OptimizeDSLoopWaitcnt)
+ return false;
+
+ // Only for GFX12+ where we have a separate counter for LDS.
+ if (!ST->hasExtendedWaitCounts())
+ return false;
----------------
arsenm wrote:
```suggestion
// Only for GFX12+ where we have a separate counter for LDS.
assert(ST->hasExtendedWaitCounts());
```
The caller already checked this
https://github.com/llvm/llvm-project/pull/171942
More information about the llvm-commits
mailing list