[llvm] [AMDGPU] Skip VGPR deallocation for waveslot limited kernels (PR #112765)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 18 02:16:49 PDT 2024
================
@@ -2606,15 +2606,24 @@ bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
// Insert DEALLOC_VGPR messages before previously identified S_ENDPGM
// instructions.
- for (MachineInstr *MI : ReleaseVGPRInsts) {
- if (ST->requiresNopBeforeDeallocVGPRs()) {
- BuildMI(*MI->getParent(), MI, MI->getDebugLoc(), TII->get(AMDGPU::S_NOP))
- .addImm(0);
+ // Skip deallocation if kernel is waveslot limited vs VGPR limited. A short
+ // waveslot limited kernel runs slower with the deallocation.
+ if (!ReleaseVGPRInsts.empty() &&
+ (MF.getFrameInfo().hasCalls() ||
+ AMDGPU::IsaInfo::getTotalNumVGPRs(ST) /
+ TRI->getNumUsedPhysRegs(*MRI, AMDGPU::VGPR_32RegClass) <
----------------
jayfoad wrote:
Can this use `getNumWavesPerEUWithNumVGPRs`? That might be slightly more correct since it accounts for details like the VGPR allocation granule.
https://github.com/llvm/llvm-project/pull/112765
More information about the llvm-commits
mailing list