[all-commits] [llvm/llvm-project] 13e49d: [amdgpu] Implement lower function LDS pass
Jon Chesterfield via All-commits
all-commits at lists.llvm.org
Mon Mar 15 08:24:45 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 13e49dcee48f7bffec17df48b87e3237aebd5b1d
https://github.com/llvm/llvm-project/commit/13e49dcee48f7bffec17df48b87e3237aebd5b1d
Author: Jon Chesterfield <jonathanchesterfield at gmail.com>
Date: 2021-03-15 (Mon, 15 Mar 2021)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPU.h
M llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
A llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
M llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
M llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
M llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
M llvm/lib/Target/AMDGPU/CMakeLists.txt
M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
M llvm/test/CodeGen/AMDGPU/GlobalISel/lds-global-non-entry-func.ll
M llvm/test/CodeGen/AMDGPU/addrspacecast-initializer-unsupported.ll
M llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll
A llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
A llvm/test/CodeGen/AMDGPU/lower-module-lds-inactive.ll
A llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect.ll
A llvm/test/CodeGen/AMDGPU/lower-module-lds-used-list.ll
A llvm/test/CodeGen/AMDGPU/lower-module-lds.ll
M llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-constantexpr-use.ll
Log Message:
-----------
[amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.
Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.
This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.
This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94648
More information about the All-commits
mailing list