[all-commits] [llvm/llvm-project] d3ceb4: [AMDGPU] Update target helpers & GCNSchedStrategy ...

Thu Mar 6 02:31:25 PST 2025

  Branch: refs/heads/users/rovka/dvgpr-6
  Home:   https://github.com/llvm/llvm-project
  Commit: d3ceb4ebd008183980b97d0887f9b2bdb30b7c6f
      https://github.com/llvm/llvm-project/commit/d3ceb4ebd008183980b97d0887f9b2bdb30b7c6f
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2025-03-06 (Thu, 06 Mar 2025)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPU.td
    M llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
    M llvm/lib/Target/AMDGPU/GCNSubtarget.h
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
    M llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
    M llvm/unittests/Target/AMDGPU/CMakeLists.txt

  Log Message:
  -----------
  [AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs

In dynamic VGPR mode, we can allocate up to 8 blocks of either 16 or 32
VGPRs (based on a chip-wide setting which we can model with a Subtarget
feature). Update some of the subtarget helpers to reflect this.

In particular:
- getVGPRAllocGranule is set to the block size
- getAddresableNumVGPR will limit itself to 8 * size of a block

We also try to be more careful about how many VGPR blocks we allocate.
Therefore, when deciding if we should revert scheduling after a given
stage, we check that we haven't increased the number of VGPR blocks that
need to be allocated.

  Commit: 3e20edfc6f3b1bfa60f5d778ce98c1fb984b1aee
      https://github.com/llvm/llvm-project/commit/3e20edfc6f3b1bfa60f5d778ce98c1fb984b1aee
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2025-03-06 (Thu, 06 Mar 2025)

  Changed paths:
    M llvm/docs/AMDGPUUsage.rst
    M llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
    M llvm/lib/Target/AMDGPU/SIDefines.h
    M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
    M llvm/lib/Target/AMDGPU/SIFrameLowering.h
    M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
    A llvm/test/CodeGen/AMDGPU/dynamic-vgpr-reserve-stack-for-cwsr.ll
    M llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll

  Log Message:
  -----------
  [AMDGPU] Allocate scratch space for dVGPRs for CWSR

The CWSR trap handler needs to save and restore the VGPRs. When dynamic
VGPRs are in use, the fixed function hardware will only allocate enough
space for one VGPR block. The rest will have to be stored in scratch, at
offset 0.

This patch allocates the necessary space by:
- generating a prologue that checks at runtime if we're on a compute
  queue (since CWSR only works on compute queues); for this we will have
  to check the ME_ID bits of the ID_HW_ID2 register - if that is
  non-zero, we can assume we're on a compute queue and initialize the SP
  and FP with enough room for the dynamic VGPRs
- forcing all compute entry functions to use a FP so they can access
  their locals/spills correctly (this isn't ideal but it's the quickest
  to implement)

Note that at the moment we allocate enough space for the theoretical
maximum number of VGPRs that can be allocated dynamically (for blocks of
16 registers, this will be 128, of which we subtract the first 16, which
are already allocated by the fixed function hardware). Future patches
may decide to allocate less if they can prove the shader never allocates
that many blocks.

Also note that this should not affect any reported stack sizes (e.g. PAL
backend_stack_size etc).

Compare: https://github.com/llvm/llvm-project/compare/d3ceb4ebd008%5E...3e20edfc6f3b

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications