[PATCH] D94648: [amdgpu] Implement lower function LDS pass
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 15 10:41:10 PST 2021
arsenm added a comment.
Needs some test to stress different alignment scenarios. Also need some with these globals used in some weird constant initializers.
I also thought the idea was to have a constant memory table with pointers in it, not one giant LDS block
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:66-68
+ static bool
+ useByKernelOrUsedList(const SmallPtrSetImpl<GlobalValue *> &UsedList,
+ const Use &U, uint64_t Depth = 0) {
----------------
I think this search isn't quite right and will miss arbitrarily nested constant expressions.
We already have similar code you need to analyze users in AMDGPUAnnotateKernelFeatures::visitConstantExprsRecursively to find constant LDS addrspacecasts anywhere they can appear.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:113
+ }
+ if (GV.isConstant()) {
+ // A constant undef variable can't be written to, and any load is
----------------
Not sure why you need to bother considering this case, if you just treat it normally it should work
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+ InlineAsm *IA =
+ InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+ Builder.CreateCall(
----------------
Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:202
+ bool runOnModule(Module &M) override {
+ if (DisableLowerFunctionLDS) {
+ return false;
----------------
Probably should move this flag into the pass pipeline in AMDGPUTargetMachine
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:207
+ LLVMContext &Ctx = M.getContext();
+ DataLayout DL = M.getDataLayout();
+
----------------
const &
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:224
+ // On ties, sort by size, descending, then by name, lexicographical.
+ std::sort(
+ FoundLocalVars.begin(), FoundLocalVars.end(),
----------------
llvm::sort
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:233-234
+
+ TypeSize SLHS = DL.getTypeSizeInBits(LHS->getValueType());
+ TypeSize SRHS = DL.getTypeSizeInBits(RHS->getValueType());
+ if (SLHS != SRHS) {
----------------
Should use the type alloc size. You are also ignoring the alignment of the global itself, which may differ from the type alignment
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:269
+ LocalVars.push_back(FGV);
+ CurrentOffset += DL.getTypeSizeInBits(FGV->getValueType()) / 8;
+ }
----------------
typeAllocSize
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:279-283
+ // IR considers address 0 to be null, despite amdgpu docs saying -1 is null
+ // This doesn't appear to cause miscompilation. Considered instantiating
+ // {[MaxAlign x i8], %__function_lds_t} instead and indexing from MaxAlign
+ // instead of zero, but it's hard to justify the overhead without evidence
+ // that referencing 'null' is an error in this context.
----------------
"null" in the IR is just 0. This is only treated as an invalid pointer in address space 0. -1 is used as the invalid pointer value and "null" in addrspace(3) is valid. Ideally this would be a property in the datalayout
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:294
+ M, LDSTy, false, GlobalValue::InternalLinkage, UndefValue::get(LDSTy),
+ "__function_lds", nullptr, GlobalValue::NotThreadLocal,
+ AMDGPUAS::LOCAL_ADDRESS, false);
----------------
Probably should use llvm.amdgcn prefix
================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:1
+; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-function-lds < %s | FileCheck %s
+
----------------
Should run with both new and old PM since you handled both
================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:4-6
+; CHECK-NOT: asm
+; CHECK-NOT: __function_lds
+; CHECK-NOT: __function_lds_t
----------------
Negative checks aren't particularly helpful. Needs positive checks for what's actually produced
================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:36
+ %v2 = atomicrmw add i32 addrspace(3)* @extern, i32 %c0 monotonic
+ %v1 = atomicrmw add i64 addrspace(4)* @addr4, i64 %c1 monotonic
+ ret void
----------------
Could also use some stores, intrinsic calls, cmpxchg.
Also some more exotic users that tend to break, such as storing the value's address to itself
================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds.ll:51
+; CHECK: call void asm sideeffect "// Alloc function lds block", "r"(%__function_lds_t addrspace(3)* @__function_lds)
+define spir_kernel void @kern_empty() {
+ ret void
----------------
We're not really handling spir_kernel anymore, should use amdgpu_kernel
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94648/new/
https://reviews.llvm.org/D94648
More information about the llvm-commits
mailing list