[llvm-branch-commits] [llvm] [AMDGPU] Add DS loop preheader flush (3/4) (PR #171948)

Matt Arsenault via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Fri Dec 12 10:07:38 PST 2025


================
@@ -2904,6 +2905,68 @@ bool SIInsertWaitcnts::applyDSLoopWaitOpt(MachineInstr &MI,
   return true;
 }
 
+// Insert DS_CNT flush in preheaders of loops where DS wait relaxation was
+// applied. This is necessary because the relaxed wait counts inside the loop
+// are computed based on the DS loads issued at the end of the previous
+// iteration (via backedge), but the first iteration enters via the preheader.
+// We must ensure all DS loads from the preheader are complete before entering
+// the loop.
+bool SIInsertWaitcnts::insertDSPreheaderFlushes(MachineFunction &MF) {
+  bool Modified = false;
+
+  for (auto &[LoopHeader, Info] : LoopDSWaitOptCache) {
+    if (!Info.Valid || !Info.RelaxationApplied)
+      continue;
+
+    MachineLoop *ML = MLI->getLoopFor(LoopHeader);
+    if (!ML)
+      continue;
+
+    MachineBasicBlock *Preheader = ML->getLoopPreheader();
+    if (!Preheader)
+      continue;
+
+    // Insert s_wait_dscnt 0 at the end of the preheader (before the terminator)
+    MachineBasicBlock::iterator InsertPos = Preheader->getFirstTerminator();
+    if (InsertPos == Preheader->end() && !Preheader->empty())
+      InsertPos = std::prev(Preheader->end());
+
+    // Check if there's already a DS wait at this position
+    bool NeedInsert = true;
+    if (InsertPos != Preheader->end() && InsertPos != Preheader->begin()) {
+      auto CheckPos = std::prev(InsertPos);
+      if (CheckPos->getOpcode() == AMDGPU::S_WAIT_DSCNT_soft ||
+          CheckPos->getOpcode() == AMDGPU::S_WAIT_DSCNT) {
+        if (CheckPos->getOperand(0).getImm() == 0)
+          NeedInsert = false;
+        else {
+          // Change existing wait to 0
+          CheckPos->getOperand(0).setImm(0);
+          NeedInsert = false;
----------------
arsenm wrote:

NeedInsert set to false on both paths here. I'd expect this to be inverted anyway, and detect when it needs to do something? 

https://github.com/llvm/llvm-project/pull/171948


More information about the llvm-branch-commits mailing list