[all-commits] [llvm/llvm-project] b76113: [AMDGPU] Use correct VGPR threshold for flagging E...
Jeffrey Byrnes via All-commits
all-commits at lists.llvm.org
Mon Mar 25 13:12:20 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: b7611370491873722e08e4ce9374312d0c936af1
https://github.com/llvm/llvm-project/commit/b7611370491873722e08e4ce9374312d0c936af1
Author: Jeffrey Byrnes <jeffrey.byrnes at amd.com>
Date: 2024-03-25 (Mon, 25 Mar 2024)
Changed paths:
M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2b.mir
Log Message:
-----------
[AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (#85860)
`ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs`
which returns 512 for gfx90a. This is subsequently limited by
`AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for
gfx90a. The ISA states we can have a total of 512 registers, but a
maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4).
Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)`
calculates the maximum number of combined VGPR + AGPR. But, it is
currently used as the limit for accvgpr and as the limit for archvgpr.
This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits.
It is not unreasonable to think other clients of getTotalNumVGPRs are
using it in the wrong way.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list