[llvm] [AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (PR #109049)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 20 02:01:06 PDT 2024
================
@@ -1752,6 +1752,15 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
const bool IsVGPR = TRI->isVectorRegister(*MRI, Op.getReg());
for (int RegNo = Interval.first; RegNo < Interval.second; ++RegNo) {
if (IsVGPR) {
+ // Implicit VGPR defs and uses are never a part of the memory
+ // instructions description and usually present to account for
+ // super-register liveness. Tied implicit sources on loads though
+ // are real uses.
+ // TODO: Most of the other instructions also have implicit uses
+ // for the liveness accounting only.
+ if (Op.isImplicit() && MI.mayLoadOrStore() && !Op.isTied())
----------------
jayfoad wrote:
SCRATCH_LOAD_USHORT writes all 32 bits, but if you mean a case like this:
```
scratch_load_dword v0, ...
scratch_load_short_d16 v0, ... // merge into low 16 bits of v0
```
... then yes I think we will insert a wait on GFX12 to avoid the WAW, and this is done only by looking at the defs, it does not depend on any tied uses. And pre-GFX12 I don't think any wait is required for this case.
https://github.com/llvm/llvm-project/pull/109049
More information about the llvm-commits
mailing list