[llvm] [AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (PR #85860)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 21 11:30:28 PDT 2024
================
@@ -1155,12 +1155,16 @@ unsigned getMinNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU) {
return std::min(MinNumVGPRs, AddrsableNumVGPRs);
}
-unsigned getMaxNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU) {
+unsigned getMaxNumVGPRs(const MCSubtargetInfo *STI, unsigned WavesPerEU,
+ bool WholeRegisterFile) {
assert(WavesPerEU != 0);
- unsigned MaxNumVGPRs = alignDown(getTotalNumVGPRs(STI) / WavesPerEU,
- getVGPRAllocGranule(STI));
- unsigned AddressableNumVGPRs = getAddressableNumVGPRs(STI);
+ unsigned MaxNumVGPRs =
+ alignDown(getTotalNumVGPRs(STI, WholeRegisterFile) / WavesPerEU,
----------------
jrbyrnes wrote:
I see that getTotalNumVGPRs should always return the total number of VGPRs.
But, I think we should still adjust the halved value against the number of waves? Otherwise we may end up in our current situation where we say wavesPerEU of 3 is allowed 168 arch vgprs and 168 acc vgrps (this is actually occupancy 1). The latest version pre-clamps getTotalNumVGPRs to getTotalNumArchVGPRs when WholeRegisterFile is not requested.
https://github.com/llvm/llvm-project/pull/85860
More information about the llvm-commits
mailing list