[all-commits] [llvm/llvm-project] 72c3c3: [AMDGPU] Allocate scratch space for dVGPRs for CWS...

Wed Mar 19 05:49:40 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 72c3c30452d340d1a90cd3e7efe685ee188003c7
      https://github.com/llvm/llvm-project/commit/72c3c30452d340d1a90cd3e7efe685ee188003c7
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2025-03-19 (Wed, 19 Mar 2025)

  Changed paths:
    M llvm/docs/AMDGPUUsage.rst
    M llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
    M llvm/lib/Target/AMDGPU/SIDefines.h
    M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
    M llvm/lib/Target/AMDGPU/SIFrameLowering.h
    M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
    M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
    M llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
    A llvm/test/CodeGen/AMDGPU/dynamic-vgpr-reserve-stack-for-cwsr.ll
    A llvm/test/CodeGen/AMDGPU/machine-function-info-cwsr.ll
    M llvm/test/CodeGen/AMDGPU/pal-metadata-3.0.ll
    M llvm/test/CodeGen/MIR/AMDGPU/long-branch-reg-all-sgpr-used.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg-debug.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

  Log Message:
  -----------
  [AMDGPU] Allocate scratch space for dVGPRs for CWSR (#130055)

The CWSR trap handler needs to save and restore the VGPRs. When dynamic
VGPRs are in use, the fixed function hardware will only allocate enough
space for one VGPR block. The rest will have to be stored in scratch, at
offset 0.

This patch allocates the necessary space by:
- generating a prologue that checks at runtime if we're on a compute
queue (since CWSR only works on compute queues); for this we will have
to check the ME_ID bits of the ID_HW_ID2 register - if that is non-zero,
we can assume we're on a compute queue and initialize the SP and FP with
enough room for the dynamic VGPRs
- forcing all compute entry functions to use a FP so they can access
their locals/spills correctly (this isn't ideal but it's the quickest to
implement)

Note that at the moment we allocate enough space for the theoretical
maximum number of VGPRs that can be allocated dynamically (for blocks of
16 registers, this will be 128, of which we subtract the first 16, which
are already allocated by the fixed function hardware). Future patches
may decide to allocate less if they can prove the shader never allocates
that many blocks.

Also note that this should not affect any reported stack sizes (e.g. PAL
backend_stack_size etc).

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications