[llvm] [AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (PR #109049)

Fri Sep 20 01:50:24 PDT 2024

================
@@ -1752,6 +1752,15 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
         const bool IsVGPR = TRI->isVectorRegister(*MRI, Op.getReg());
         for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
           if (IsVGPR) {
+            // Implicit VGPR defs and uses are never a part of the memory
+            // instructions description and usually present to account for
+            // super-register liveness. Tied implicit sources on loads though
+            // are real uses.
+            // TODO: Most of the other instructions also have implicit uses
+            // for the liveness accounting only.
+            if (Op.isImplicit() && MI.mayLoadOrStore() && !Op.isTied())
----------------
rampitec wrote:

> Because in GFX12, vmem loads do _not_ write their vgpr results in order. See #105549. (When I said "usually it is not necessary" I meant pre-GFX12.)

So you mean is there is SCRATCH_LOAD_USHORT merging vdst on gfx12 it will be handled anyway?

https://github.com/llvm/llvm-project/pull/109049