[PATCH] D130313: [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
Baptiste Saleil via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 21 16:11:39 PDT 2022
bsaleil created this revision.
bsaleil added reviewers: foad, nhaehnle, AMDGPU.
bsaleil added projects: LLVM, AMDGPU.
Herald added subscribers: kosarev, jsilvanus, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, jvesely, kzhuravl, arsenm.
Herald added a project: All.
bsaleil requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop.
The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixed that case.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D130313
Files:
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
Index: llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
===================================================================
--- llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -535,3 +535,40 @@
S_ENDPGM 0
...
+---
+
+# This test case checks that we flush the vmcnt counter only if necessary
+# (i.e. if a waitcnt is needed for the vgpr use we find in the loop)
+
+# GFX10-LABEL: waitcnt_vm_necessary
+# GFX10-LABEL: bb.0:
+# GFX10: S_WAITCNT 16240
+# GFX10: renamable $vgpr4
+# GFX10-NOT: S_WAITCNT 16240
+# GFX10-LABEL: bb.1:
+# GFX10-NOT: S_WAITCNT 16240
+
+# GFX9-LABEL: waitcnt_vm_necessary
+# GFX9-LABEL: bb.0:
+# GFX9: S_WAITCNT 3952
+# GFX9: renamable $vgpr4
+# GFX9-NOT: S_WAITCNT 3952
+# GFX9-LABEL: bb.1:
+# GFX9-NOT: S_WAITCNT 3952
+
+name: waitcnt_vm_necessary
+body: |
+ bb.0:
+ successors: %bb.1(0x80000000)
+
+ renamable $vgpr0_vgpr1_vgpr2_vgpr3 = GLOBAL_LOAD_DWORDX4 killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec
+ renamable $vgpr4 = BUFFER_LOAD_DWORD_OFFEN undef renamable $vgpr0, undef renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
+
+ bb.1:
+ successors: %bb.1(0x40000000)
+
+ renamable $vgpr5 = BUFFER_LOAD_DWORD_OFFEN undef renamable $vgpr0, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
+ S_CBRANCH_SCC1 %bb.1, implicit killed $scc
+ S_ENDPGM 0
+
+...
Index: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
===================================================================
--- llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -569,6 +569,10 @@
return VsCnt != ~0u;
}
+ bool hasWaitVmCnt() const {
+ return VmCnt != ~0u;
+ }
+
bool dominates(const Waitcnt &Other) const {
return VmCnt <= Other.VmCnt && ExpCnt <= Other.ExpCnt &&
LgkmCnt <= Other.LgkmCnt && VsCnt <= Other.VsCnt;
Index: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1739,7 +1739,10 @@
VgprUse.insert(RegNo);
// If at least one of Op's registers is in the score brackets, the
// value is likely loaded outside of the loop.
- if (Brackets.getRegScore(RegNo, VM_CNT) > 0) {
+ unsigned Score = Brackets.getRegScore(RegNo, VM_CNT);
+ AMDGPU::Waitcnt Wait;
+ Brackets.determineWait(VM_CNT, Score, Wait);
+ if (Wait.hasWaitVmCnt()) {
UsesVgprLoadedOutside = true;
break;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D130313.446654.patch
Type: text/x-patch
Size: 2695 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220721/2918dfd4/attachment.bin>
More information about the llvm-commits
mailing list