[all-commits] [llvm/llvm-project] d3ceb4: [AMDGPU] Update target helpers & GCNSchedStrategy ...
Diana Picus via All-commits
all-commits at lists.llvm.org
Thu Mar 6 02:31:25 PST 2025
Branch: refs/heads/users/rovka/dvgpr-6
Home: https://github.com/llvm/llvm-project
Commit: d3ceb4ebd008183980b97d0887f9b2bdb30b7c6f
https://github.com/llvm/llvm-project/commit/d3ceb4ebd008183980b97d0887f9b2bdb30b7c6f
Author: Diana Picus <Diana-Magda.Picus at amd.com>
Date: 2025-03-06 (Thu, 06 Mar 2025)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPU.td
M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
M llvm/lib/Target/AMDGPU/GCNSubtarget.h
M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
M llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
M llvm/unittests/Target/AMDGPU/CMakeLists.txt
Log Message:
-----------
[AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs
In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32
VGPRs (based on a chip-wide setting which we can model with a Subtarget
feature). Update some of the subtarget helpers to reflect this.
In particular:
- getVGPRAllocGranule is set to the block size
- getAddresableNumVGPR will limit itself to 8 * size of a block
We also try to be more careful about how many VGPR blocks we allocate.
Therefore, when deciding if we should revert scheduling after a given
stage, we check that we haven't increased the number of VGPR blocks that
need to be allocated.
Commit: 3e20edfc6f3b1bfa60f5d778ce98c1fb984b1aee
https://github.com/llvm/llvm-project/commit/3e20edfc6f3b1bfa60f5d778ce98c1fb984b1aee
Author: Diana Picus <Diana-Magda.Picus at amd.com>
Date: 2025-03-06 (Thu, 06 Mar 2025)
Changed paths:
M llvm/docs/AMDGPUUsage.rst
M llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
M llvm/lib/Target/AMDGPU/SIDefines.h
M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
M llvm/lib/Target/AMDGPU/SIFrameLowering.h
M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
A llvm/test/CodeGen/AMDGPU/dynamic-vgpr-reserve-stack-for-cwsr.ll
M llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
Log Message:
-----------
[AMDGPU] Allocate scratch space for dVGPRs for CWSR
The CWSR trap handler needs to save and restore the VGPRs. When dynamic
VGPRs are in use, the fixed function hardware will only allocate enough
space for one VGPR block. The rest will have to be stored in scratch, at
offset 0.
This patch allocates the necessary space by:
- generating a prologue that checks at runtime if we're on a compute
queue (since CWSR only works on compute queues); for this we will have
to check the ME_ID bits of the ID_HW_ID2 register - if that is
non-zero, we can assume we're on a compute queue and initialize the SP
and FP with enough room for the dynamic VGPRs
- forcing all compute entry functions to use a FP so they can access
their locals/spills correctly (this isn't ideal but it's the quickest
to implement)
Note that at the moment we allocate enough space for the theoretical
maximum number of VGPRs that can be allocated dynamically (for blocks of
16 registers, this will be 128, of which we subtract the first 16, which
are already allocated by the fixed function hardware). Future patches
may decide to allocate less if they can prove the shader never allocates
that many blocks.
Also note that this should not affect any reported stack sizes (e.g. PAL
backend_stack_size etc).
Compare: https://github.com/llvm/llvm-project/compare/d3ceb4ebd008%5E...3e20edfc6f3b
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list