[llvm] [AMDGPU] Add getRegPressureLimit(TargetOccupancy) (PR #84311)

Thu Mar 7 03:49:17 PST 2024

https://github.com/rovka created https://github.com/llvm/llvm-project/pull/84311

Add a new overload* for `getRegPressureLimit` that computes the limit using a given target occupancy, rather than the one based on the LDS size. Use it to set the critical VGPR limit in `GCNSchedStrategy` (because I'm touching `GCNSchedStrategy` downstream and it seems like a better idea to use the concept of the register pressure limit than to invent a new confusingly named helper that's trying to do mostly the same thing).

This is NFC in our existing tests, but there might be examples in the wild where this could change the scheduling. In particular it might set a more aggressive `VGPRCriticalLimit` if
```
getMaxNumVGPRs(MFI.getWavesPerEU() < getMaxNumVGPRs(TargetOccupancy)
```
I'm not sure how often we'd hit that case, and even if we do hit it, using fewer VGPRs might be a good thing anyway.

(*) I'm adding an overload rather than a simple optional parameter because the original getRegPressureLimit is virtual.

>From 3bcff635fe9957809b07e1d26b8d04a7d59f785a Mon Sep 17 00:00:00 2001
From: Diana Picus <Diana-Magda.Picus at amd.com>
Date: Tue, 27 Feb 2024 12:08:35 +0100
Subject: [PATCH] [AMDGPU] Add getRegPressureLimit(TargetOccupancy)

Add a new overload* for `getRegPressureLimit` that computes the limit
using a given target occupancy, rather than the one based on the LDS
size. Use it to set the critical VGPR limit in `GCNSchedStrategy`.

This is NFC in our existing tests, but there might be examples in the
wild where this could change the scheduling. In particular it might set
a more aggressive `VGPRCriticalLimit` if
```
getMaxNumVGPRs(MFI.getWavesPerEU() < getMaxNumVGPRs(TargetOccupancy)
```
I'm not sure how often we'd hit that case, and even if we do hit it,
using fewer VGPRs might be a good thing anyway.

I'm going to touch `GCNSchedStrategy` downstream and it seems like
a better idea to use the concept of the register pressure limit than to
invent a new confusingly named helper that's trying to do mostly the
same thing.

* I'm adding an overload rather than a simple optional parameter because
  the original getRegPressureLimit is virtual.
---
 llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp |  5 ++++-
 llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp   | 15 +++++++++++----
 llvm/lib/Target/AMDGPU/SIRegisterInfo.h     |  5 +++++
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 9f419a7fbf6834..39e148ecd21681 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -88,8 +88,11 @@ void GCNSchedStrategy::initialize(ScheduleDAGMI *DAG) {
       std::min(ST.getMaxNumSGPRs(TargetOccupancy, true), SGPRExcessLimit);
 
   if (!KnownExcessRP) {
+    const SIRegisterInfo *SRI = static_cast<const SIRegisterInfo *>(TRI);
     VGPRCriticalLimit =
-        std::min(ST.getMaxNumVGPRs(TargetOccupancy), VGPRExcessLimit);
+        std::min(SRI->getRegPressureLimit(&AMDGPU::VGPR_32RegClass, *MF,
+                                          TargetOccupancy),
+                 VGPRExcessLimit);
   } else {
     // This is similar to ST.getMaxNumVGPRs(TargetOccupancy) result except
     // returns a reasonably small number for targets with lots of VGPRs, such
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 3664535b325997..fe9e7750984522 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3050,17 +3050,24 @@ bool SIRegisterInfo::shouldCoalesce(MachineInstr *MI,
 unsigned SIRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
                                              MachineFunction &MF) const {
   const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
+  unsigned Occupancy =
+      ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(), MF.getFunction());
 
-  unsigned Occupancy = ST.getOccupancyWithLocalMemSize(MFI->getLDSSize(),
-                                                       MF.getFunction());
+  return getRegPressureLimit(RC, MF, Occupancy);
+}
+
+unsigned SIRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
+                                             MachineFunction &MF,
+                                             unsigned TargetOccupancy) const {
   switch (RC->getID()) {
   default:
     return AMDGPUGenRegisterInfo::getRegPressureLimit(RC, MF);
   case AMDGPU::VGPR_32RegClassID:
-    return std::min(ST.getMaxNumVGPRs(Occupancy), ST.getMaxNumVGPRs(MF));
+    return std::min(ST.getMaxNumVGPRs(TargetOccupancy), ST.getMaxNumVGPRs(MF));
   case AMDGPU::SGPR_32RegClassID:
   case AMDGPU::SGPR_LO16RegClassID:
-    return std::min(ST.getMaxNumSGPRs(Occupancy, true), ST.getMaxNumSGPRs(MF));
+    return std::min(ST.getMaxNumSGPRs(TargetOccupancy, true),
+                    ST.getMaxNumSGPRs(MF));
   }
 }
 
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 88d5686720985e..41a5104c3b7e71 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -323,6 +323,11 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
   unsigned getRegPressureLimit(const TargetRegisterClass *RC,
                                MachineFunction &MF) const override;
 
+  // Compute the register pressure limit for a given \p TargetOccupancy.
+  unsigned getRegPressureLimit(const TargetRegisterClass *RC,
+                               MachineFunction &MF,
+                               unsigned TargetOccupancy) const;
+
   unsigned getRegPressureSetLimit(const MachineFunction &MF,
                                   unsigned Idx) const override;