[llvm] [AMDGPU] Avoid unneeded waitcounts before spill stores (PR #108303)

Thu Sep 12 01:17:03 PDT 2024

================
@@ -901,7 +901,7 @@ void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
     }
   } else /* LGKM_CNT || EXP_CNT || VS_CNT || NUM_INST_CNTS */ {
     // Match the score to the destination registers.
-    for (unsigned I = 0, E = Inst.getNumOperands(); I != E; ++I) {
+    for (unsigned I = 0, E = Inst.getNumExplicitOperands(); I != E; ++I) {
----------------
rampitec wrote:

> I can never remember how all of these counts work with the 2 notions of "implicit". Does this catch the implicit-def present in the instruction definition? e.g. implicit-def $vcc on v_cmp_*

For non-variadic instructions it returns a getNumOperands() from its MCInstrDesc. I.e. it shall include implicit defs and uses from the instruction description. Besides, there is no way to load vcc, so there is no need to wait for such load. I have also checked there is no way to load M0, a common real implicit use, as scalar loads use XM0 class for sdst. I cannot think of any other implicit operands which may matter.

https://github.com/llvm/llvm-project/pull/108303