[llvm] [AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (PR #109049)
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 19 11:49:37 PDT 2024
================
@@ -1752,6 +1752,15 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
const bool IsVGPR = TRI->isVectorRegister(*MRI, Op.getReg());
for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
if (IsVGPR) {
+ // Implicit VGPR defs and uses are never a part of the memory
+ // instructions description and usually present to account for
+ // super-register liveness. Tied implicit sources on loads though
+ // are real uses.
+ // TODO: Most of the other instructions also have implicit uses
+ // for the liveness accounting only.
+ if (Op.isImplicit() && MI.mayLoadOrStore() && !Op.isTied())
----------------
rampitec wrote:
The failed test was image-waterfall-loop-O0.ll, this wait was missing:
```
v_mov_b32_e32 v3, s4
; kill: killed $vgpr4
s_xor_saveexec_b32 s4, -1
s_waitcnt vmcnt(0)
buffer_load_dword v0, off, s[0:3], s32 offset:80 ; 4-byte Folded Reload
buffer_load_dword v2, off, s[0:3], s32 offset:84 ; 4-byte Folded Reload
```
That is how pass debug log looks if I remove the tied check:
```
VM_CNT(2): 1:v0 0:v4
LGKM_CNT(0):
EXP_CNT(0):
VS_CNT(86):
$vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 80, 0, 0, implicit $exec, implicit $vgpr0(tied-def 0) :: (load (s32) from %stack.16, addrspace 5)
```
So it reads v0 and merges the load back. This may be not needed for a dword load, but what if we read 16-bit and preserve the other half? The pattern will be the same, a tied def.
https://github.com/llvm/llvm-project/pull/109049
More information about the llvm-commits
mailing list