[all-commits] [llvm/llvm-project] b76113: [AMDGPU] Use correct VGPR threshold for flagging E...

Jeffrey Byrnes via All-commits all-commits at lists.llvm.org
Mon Mar 25 13:12:20 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: b7611370491873722e08e4ce9374312d0c936af1
      https://github.com/llvm/llvm-project/commit/b7611370491873722e08e4ce9374312d0c936af1
  Author: Jeffrey Byrnes <jeffrey.byrnes at amd.com>
  Date:   2024-03-25 (Mon, 25 Mar 2024)

  Changed paths:
    M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
    M llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2b.mir

  Log Message:
  -----------
  [AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (#85860)

`ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs`
which returns 512 for gfx90a. This is subsequently limited by
`AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for
gfx90a. The ISA states we can have a total of 512 registers, but a
maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4).

Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)`
calculates the maximum number of combined VGPR + AGPR. But, it is
currently used as the limit for accvgpr and as the limit for archvgpr.

This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits.

It is not unreasonable to think other clients of getTotalNumVGPRs are
using it in the wrong way.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list