[llvm] [AMDGPU] Extend DS loop wait optimization with flush point tracking (PR #175658)

Sun Feb 22 18:46:45 PST 2026

================
@@ -3442,22 +3463,41 @@ SIInsertWaitcnts::getPreheaderFlushFlags(MachineLoop *ML,
             VgprDefVMEM.insert(RU);
           }
         }
-        // Early exit if both optimizations are invalidated
-        if (VMemInvalidated && DSInvalidated)
+        // Early exit if all optimizations are invalidated
+        if (VMemInvalidated && DSInvalidated && FlushPointTrackingInvalid)
           return Flags;
       }
 
+      bool IsDSRead = isDSRead(MI);
----------------
ssahasra wrote:

> Aligning with MIR reaching here, i.e., DS read.

That's a relief. Where a choice is available, I would strongly recommend using "read". "read" and "write" are important events in the memory model, and this pass is definitely part of the memory model.

https://github.com/llvm/llvm-project/pull/175658