[all-commits] [llvm/llvm-project] 5bad5d: Reland [AMDGPU] Support block load/store for CSR #...

Diana Picus via All-commits all-commits at lists.llvm.org
Fri Apr 25 02:29:48 PDT 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5bad5d84a15a4f36b8ed5b33dde924683bcc9ea1
      https://github.com/llvm/llvm-project/commit/5bad5d84a15a4f36b8ed5b33dde924683bcc9ea1
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2025-04-25 (Fri, 25 Apr 2025)

  Changed paths:
    M llvm/include/llvm/CodeGen/MachineFrameInfo.h
    M llvm/include/llvm/CodeGen/TargetFrameLowering.h
    M llvm/lib/CodeGen/PrologEpilogInserter.cpp
    M llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp
    M llvm/lib/Target/AMDGPU/AMDGPU.td
    M llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
    M llvm/lib/Target/AMDGPU/GCNSubtarget.h
    M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
    M llvm/lib/Target/AMDGPU/SIFrameLowering.h
    M llvm/lib/Target/AMDGPU/SIInstrInfo.h
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
    M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.h
    A llvm/test/CodeGen/AMDGPU/pei-vgpr-block-spill-csr.mir
    A llvm/test/CodeGen/AMDGPU/spill-vgpr-block.ll
    A llvm/test/CodeGen/AMDGPU/vgpr-blocks-funcinfo.mir
    M llvm/unittests/Target/AMDGPU/CMakeLists.txt
    A llvm/unittests/Target/AMDGPU/LiveRegUnits.cpp

  Log Message:
  -----------
  Reland [AMDGPU] Support block load/store for CSR #130013 (#137169)

Add support for using the existing SCRATCH_STORE_BLOCK and
SCRATCH_LOAD_BLOCK instructions for saving and restoring callee-saved
VGPRs. This is controlled by a new subtarget feature, block-vgpr-csr. It
does not include WWM registers - those will be saved and restored
individually, just like before. This patch does not change the ABI.

Use of this feature may lead to slightly increased stack usage, because
the memory is not compacted if certain registers don't have to be
transferred (this will happen in practice for calling conventions where
the callee and caller saved registers are interleaved in groups of 8).
However, if the registers at the end of the block of 32 don't have to be
transferred, we don't need to use a whole 128-byte stack slot - we can
trim some space off the end of the range.

In order to implement this feature, we need to rely less on the
target-independent code in the PrologEpilogInserter, so we override
several new methods in SIFrameLowering. We also add new pseudos,
SI_BLOCK_SPILL_V1024_SAVE/RESTORE.

One peculiarity is that both the SI_BLOCK_V1024_RESTORE pseudo and the
SCRATCH_LOAD_BLOCK instructions will have all the registers that are not
transferred added as implicit uses. This is done in order to inform
LiveRegUnits that those registers are not available before the restore
(since we're not really restoring them - so we can't afford to scavenge
them). Unfortunately, this trick doesn't work with the save, so before
the save all the registers in the block will be unavailable (see the
unit test).

This was reverted due to failures in the builds with expensive checks
on, now fixed by always updating LiveIntervals and SlotIndexes in
SILowerSGPRSpills.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list