[llvm] [AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (PR #85860)

Jeffrey Byrnes via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 21 11:30:28 PDT 2024


================
@@ -1155,12 +1155,16 @@ unsigned getMinNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU) {
   return std::min(MinNumVGPRs, AddrsableNumVGPRs);
 }
 
-unsigned getMaxNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU) {
+unsigned getMaxNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU,
+                        bool WholeRegisterFile) {
   assert(WavesPerEU != 0);
 
-  unsigned MaxNumVGPRs = alignDown(getTotalNumVGPRs(STI) / WavesPerEU,
-                                   getVGPRAllocGranule(STI));
-  unsigned AddressableNumVGPRs = getAddressableNumVGPRs(STI);
+  unsigned MaxNumVGPRs =
+      alignDown(getTotalNumVGPRs(STI, WholeRegisterFile) / WavesPerEU,
----------------
jrbyrnes wrote:

I see that getTotalNumVGPRs should always return the total number of VGPRs.

But, I think we should still adjust the halved value against the number of waves? Otherwise we may end up in our current situation where we say wavesPerEU of 3 is allowed 168 arch vgprs and 168 acc vgrps (this is actually occupancy 1). The latest version pre-clamps getTotalNumVGPRs to getTotalNumArchVGPRs when WholeRegisterFile is not requested.

https://github.com/llvm/llvm-project/pull/85860


More information about the llvm-commits mailing list