[all-commits] [llvm/llvm-project] 13e49d: [amdgpu] Implement lower function LDS pass

Jon Chesterfield via All-commits all-commits at lists.llvm.org
Mon Mar 15 08:24:45 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 13e49dcee48f7bffec17df48b87e3237aebd5b1d
      https://github.com/llvm/llvm-project/commit/13e49dcee48f7bffec17df48b87e3237aebd5b1d
  Author: Jon Chesterfield <jonathanchesterfield at gmail.com>
  Date:   2021-03-15 (Mon, 15 Mar 2021)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPU.h
    M llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
    A llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
    M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/lib/Target/AMDGPU/CMakeLists.txt
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/test/CodeGen/AMDGPU/GlobalISel/lds-global-non-entry-func.ll
    M llvm/test/CodeGen/AMDGPU/addrspacecast-initializer-unsupported.ll
    M llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll
    A llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
    A llvm/test/CodeGen/AMDGPU/lower-module-lds-inactive.ll
    A llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect.ll
    A llvm/test/CodeGen/AMDGPU/lower-module-lds-used-list.ll
    A llvm/test/CodeGen/AMDGPU/lower-module-lds.ll
    M llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-constantexpr-use.ll

  Log Message:
  -----------
  [amdgpu] Implement lower function LDS pass

[amdgpu] Implement lower function LDS pass

Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.

Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.

This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.

This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94648




More information about the All-commits mailing list