[all-commits] [llvm/llvm-project] aff1e1: [AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.c...

Thu Mar 6 04:44:17 PST 2025

  Branch: refs/heads/users/rovka/dvgpr-4
  Home:   https://github.com/llvm/llvm-project
  Commit: aff1e132263dba730999eb017b7548a5d2f46b6f
      https://github.com/llvm/llvm-project/commit/aff1e132263dba730999eb017b7548a5d2f46b6f
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2025-03-06 (Thu, 06 Mar 2025)

  Changed paths:
    M llvm/include/llvm/CodeGen/SelectionDAGISel.h
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/lib/Target/AMDGPU/SILateBranchLowering.cpp
    A llvm/test/CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll
    M llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w32.ll
    M llvm/test/CodeGen/AMDGPU/isel-amdgcn-cs-chain-intrinsic-w64.ll
    A llvm/test/CodeGen/AMDGPU/isel-amdgpu-cs-chain-intrinsic-dyn-vgpr-w32.ll
    A llvm/test/CodeGen/AMDGPU/remove-register-flags.mir

  Log Message:
  -----------
  [AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain

The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may
indicate that we want to reallocate the VGPRs before performing the
call.

A call with the following arguments:
```
llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args,
  /*flags*/0x1, %num_vgprs, %fallback_exec, %fallback_callee
```
is supposed to do the following:
- copy the SGPR and VGPR args into their respective registers
- try to change the VGPR allocation
- if the allocation has succeeded, set EXEC to %exec and jump to
  %callee, otherwise set EXEC to %fallback_exec and jump to
  %fallback_callee

This patch implements the dynamic VGPR behaviour by generating an
S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and
callee. The rest of the call sequence is left undisturbed (i.e.
identical to the case where the flags are 0 and we don't use dynamic
VGPRs). We achieve this by introducing some new pseudos
(SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering
pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason
is so that we don't risk other passes (particularly the PostRA
scheduler) introducing instructions between the S_ALLOC_VGPR and the
jump. Such instructions might end up using VGPRs that have been
deallocated, or the wrong EXEC mask. Once the whole backend treats
S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use
VGPRs, we could in principle move the expansion earlier (but in the
absence of a good reason for that my personal preference is to keep
it later in order to make debugging easier).

Since the expansion happens after register allocation, we're
careful to select constants to immediate operands instead of letting
ISel generate S_MOVs which could interfere with register allocation
(i.e. make it look like we need more registers than we actually do).

For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out
during ISel in wave64 mode. However, we can define the pseudos for
wave64 too so it's easy to handle if future generations support it.

Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic at amd.com>

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications