[all-commits] [llvm/llvm-project] 44556e: [amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic...

Tue Sep 10 04:25:14 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 44556e64f21c1e21dd0e8329bc0e54e1887261c5
      https://github.com/llvm/llvm-project/commit/44556e64f21c1e21dd0e8329bc0e54e1887261c5
  Author: Diana Picus <Diana-Magda.Picus at amd.com>
  Date:   2024-09-10 (Tue, 10 Sep 2024)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsAMDGPU.td
    M llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
    M llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
    M llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
    M llvm/lib/Target/AMDGPU/SIInstructions.td
    M llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
    M llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
    A llvm/test/CodeGen/AMDGPU/llvm.amdgcn.init.whole.wave-w32.ll
    A llvm/test/CodeGen/AMDGPU/llvm.amdgcn.init.whole.wave-w64.ll
    M llvm/test/CodeGen/AMDGPU/pei-amdgpu-cs-chain.mir
    A llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir
    M llvm/test/CodeGen/MIR/AMDGPU/long-branch-reg-all-sgpr-used.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-after-pei.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg-debug.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir
    M llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

  Log Message:
  -----------
  [amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
  %func_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %func_exec, label %func, label %tail

func:
  ; ... stuff that should run with the actual EXEC mask
  br label %tail

tail:
  ; ... stuff that runs with all the lanes enabled;
  ; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications