[llvm] [AMDGPU] Avoid unneeded waitcounts before spill stores (PR #108303)

Thu Sep 12 12:53:12 PDT 2024

================
@@ -901,7 +901,7 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
     }
   } else /* LGKM_CNT || EXP_CNT || VS_CNT || NUM_INST_CNTS */ {
     // Match the score to the destination registers.
-    for (unsigned I = 0, E = Inst.getNumOperands(); I != E; ++I) {
+    for (unsigned I = 0, E = Inst.getNumExplicitOperands(); I != E; ++I) {
----------------
rampitec wrote:

I have added the test with vcc_lo load and implicit use of vcc, it has the wait as expected.

The reason it is correct because only memory instructions arrive to the modified loop, VALU is processed in a different place. Memory instructions may implicitly use m0 and flat_scr, but these are not loadable.

https://github.com/llvm/llvm-project/pull/108303