[llvm] [AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (PR #109049)

Fri Sep 20 02:01:06 PDT 2024

================
@@ -1752,6 +1752,15 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
         const bool IsVGPR = TRI->isVectorRegister(*MRI, Op.getReg());
         for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
           if (IsVGPR) {
+            // Implicit VGPR defs and uses are never a part of the memory
+            // instructions description and usually present to account for
+            // super-register liveness. Tied implicit sources on loads though
+            // are real uses.
+            // TODO: Most of the other instructions also have implicit uses
+            // for the liveness accounting only.
+            if (Op.isImplicit() && MI.mayLoadOrStore() && !Op.isTied())
----------------
jayfoad wrote:

SCRATCH_LOAD_USHORT writes all 32 bits, but if you mean a case like this:
```
scratch_load_dword v0, ...
scratch_load_short_d16 v0, ... // merge into low 16 bits of v0
```
... then yes I think we will insert a wait on GFX12 to avoid the WAW, and this is done only by looking at the defs, it does not depend on any tied uses. And pre-GFX12 I don't think any wait is required for this case.

https://github.com/llvm/llvm-project/pull/109049