[all-commits] [llvm/llvm-project] 4a5807: [AMDGPU] Support block load/store for CSR (#130013)
Diana Picus via All-commits
all-commits at lists.llvm.org
Wed Apr 23 01:33:58 PDT 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 4a58071d87265dfccba72134b25cf4d1595d98c5
https://github.com/llvm/llvm-project/commit/4a58071d87265dfccba72134b25cf4d1595d98c5
Author: Diana Picus <Diana-Magda.Picus at amd.com>
Date: 2025-04-23 (Wed, 23 Apr 2025)
Changed paths:
M llvm/include/llvm/CodeGen/MachineFrameInfo.h
M llvm/include/llvm/CodeGen/TargetFrameLowering.h
M llvm/lib/CodeGen/PrologEpilogInserter.cpp
M llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp
M llvm/lib/Target/AMDGPU/AMDGPU.td
M llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
M llvm/lib/Target/AMDGPU/GCNSubtarget.h
M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
M llvm/lib/Target/AMDGPU/SIFrameLowering.h
M llvm/lib/Target/AMDGPU/SIInstrInfo.h
M llvm/lib/Target/AMDGPU/SIInstructions.td
M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
M llvm/lib/Target/AMDGPU/SIRegisterInfo.h
A llvm/test/CodeGen/AMDGPU/pei-vgpr-block-spill-csr.mir
A llvm/test/CodeGen/AMDGPU/spill-vgpr-block.ll
A llvm/test/CodeGen/AMDGPU/vgpr-blocks-funcinfo.mir
M llvm/unittests/Target/AMDGPU/CMakeLists.txt
A llvm/unittests/Target/AMDGPU/LiveRegUnits.cpp
Log Message:
-----------
[AMDGPU] Support block load/store for CSR (#130013)
Add support for using the existing `SCRATCH_STORE_BLOCK` and
`SCRATCH_LOAD_BLOCK` instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, `block-vgpr-csr`.
It does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.
Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.
In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in `SIFrameLowering`. We also add new pseudos,
`SI_BLOCK_SPILL_V1024_SAVE/RESTORE`.
One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list