[llvm] b556726 - [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 28 10:06:25 PDT 2022
Author: Baptiste
Date: 2022-09-28T13:05:50-04:00
New Revision: b556726ccc5670637e84f1b26ef7e998c94f1d42
URL: https://github.com/llvm/llvm-project/commit/b556726ccc5670637e84f1b26ef7e998c94f1d42
DIFF: https://github.com/llvm/llvm-project/commit/b556726ccc5670637e84f1b26ef7e998c94f1d42.diff
LOG: [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
One of the conditions to flush the vmcnt counter in loop preheaders is: The loop
contains a use of a vgpr that is defined out of the loop. The code currently
checks if a waitcnt is needed by looking at the score of that vgpr in the score
brackets. This is not enough and may cause the generation of an unnecessary
vmcnt flush. This patch fixes that case.
Differential Revision: https://reviews.llvm.org/D130313
Added:
Modified:
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
Removed:
################################################################################
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 240d6a5723d56..0f9f1aee8996f 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1737,7 +1737,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
VgprUse.insert(RegNo);
// If at least one of Op's registers is in the score brackets, the
// value is likely loaded outside of the loop.
- if (Brackets.getRegScore(RegNo, VM_CNT) > 0) {
+ if (Brackets.getRegScore(RegNo, VM_CNT) > Brackets.getScoreLB(VM_CNT)) {
UsesVgprLoadedOutside = true;
break;
}
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index 8b952b49432fb..66e710d585325 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -535,3 +535,40 @@ body: |
S_ENDPGM 0
...
+---
+
+# This test case checks that we flush the vmcnt counter only if necessary
+# (i.e. if a waitcnt is needed for the vgpr use we find in the loop)
+
+# GFX10-LABEL: waitcnt_vm_necessary
+# GFX10-LABEL: bb.0:
+# GFX10: S_WAITCNT 16240
+# GFX10: $vgpr4
+# GFX10-NOT: S_WAITCNT
+# GFX10-LABEL: bb.1:
+# GFX10-NOT: S_WAITCNT
+
+# GFX9-LABEL: waitcnt_vm_necessary
+# GFX9-LABEL: bb.0:
+# GFX9: S_WAITCNT 3952
+# GFX9: $vgpr4
+# GFX9-NOT: S_WAITCNT
+# GFX9-LABEL: bb.1:
+# GFX9-NOT: S_WAITCNT
+
+name: waitcnt_vm_necessary
+body: |
+ bb.0:
+ successors: %bb.1(0x80000000)
+
+ $vgpr0_vgpr1_vgpr2_vgpr3 = GLOBAL_LOAD_DWORDX4 killed $vgpr0_vgpr1, 0, 0, implicit $exec
+ $vgpr4 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
+
+ bb.1:
+ successors: %bb.1(0x40000000)
+
+ $vgpr5 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
+ S_CBRANCH_SCC1 %bb.1, implicit killed $scc
+ S_ENDPGM 0
+
+...
More information about the llvm-commits
mailing list