[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Fri Jan 15 10:41:10 PST 2021

arsenm added a comment.

Needs some test to stress different alignment scenarios. Also need some with these globals used in some weird constant initializers.

I also thought the idea was to have a constant memory table with pointers in it, not one giant LDS block

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:66-68
+  static bool
+  useByKernelOrUsedList(const SmallPtrSetImpl<GlobalValue *> &UsedList,
+                        const Use &U, uint64_t Depth = 0) {
----------------
I think this search isn't quite right and will miss arbitrarily nested constant expressions.

We already have similar code you need to analyze users in  AMDGPUAnnotateKernelFeatures::visitConstantExprsRecursively to find constant LDS addrspacecasts anywhere they can appear.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:113
+      }
+      if (GV.isConstant()) {
+        // A constant undef variable can't be written to, and any load is
----------------
Not sure why you need to bother considering this case, if you just treat it normally it should work

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+    InlineAsm *IA =
+        InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+    Builder.CreateCall(
----------------
Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:202
+  bool runOnModule(Module &M) override {
+    if (DisableLowerFunctionLDS) {
+      return false;
----------------
Probably should move this flag into the pass pipeline in AMDGPUTargetMachine

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:207
+    LLVMContext &Ctx = M.getContext();
+    DataLayout DL = M.getDataLayout();
+
----------------
const &

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:224
+    // On ties, sort by size, descending, then by name, lexicographical.
+    std::sort(
+        FoundLocalVars.begin(), FoundLocalVars.end(),
----------------
llvm::sort

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:233-234
+
+          TypeSize SLHS = DL.getTypeSizeInBits(LHS->getValueType());
+          TypeSize SRHS = DL.getTypeSizeInBits(RHS->getValueType());
+          if (SLHS != SRHS) {
----------------
Should use the type alloc size. You are also ignoring the alignment of the global itself, which may differ from the type alignment

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:269
+        LocalVars.push_back(FGV);
+        CurrentOffset += DL.getTypeSizeInBits(FGV->getValueType()) / 8;
+      }
----------------
typeAllocSize

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:279-283
+    // IR considers address 0 to be null, despite amdgpu docs saying -1 is null
+    // This doesn't appear to cause miscompilation. Considered instantiating
+    // {[MaxAlign x i8], %__function_lds_t} instead and indexing from MaxAlign
+    // instead of zero, but it's hard to justify the overhead without evidence
+    // that referencing 'null' is an error in this context.
----------------
"null" in the IR is just 0. This is only treated as an invalid pointer in address space 0. -1 is used as the invalid pointer value and "null" in addrspace(3) is valid. Ideally this would be a property in the datalayout

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:294
+        M, LDSTy, false, GlobalValue::InternalLinkage, UndefValue::get(LDSTy),
+        "__function_lds", nullptr, GlobalValue::NotThreadLocal,
+        AMDGPUAS::LOCAL_ADDRESS, false);
----------------
Probably should use llvm.amdgcn prefix

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:1
+; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-function-lds < %s | FileCheck %s
+
----------------
Should run with both new and old PM since you handled both

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:4-6
+; CHECK-NOT: asm
+; CHECK-NOT: __function_lds
+; CHECK-NOT: __function_lds_t
----------------
Negative checks aren't particularly helpful. Needs positive checks for what's actually produced

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds-inactive.ll:36
+  %v2 = atomicrmw add i32 addrspace(3)* @extern, i32 %c0  monotonic
+  %v1 = atomicrmw add i64 addrspace(4)* @addr4, i64 %c1 monotonic
+  ret void
----------------
Could also use some stores, intrinsic calls, cmpxchg. 

Also some more exotic users that tend to break, such as storing the value's address to itself

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-function-lds.ll:51
+; CHECK: call void asm sideeffect "// Alloc function lds block", "r"(%__function_lds_t addrspace(3)* @__function_lds)
+define spir_kernel void @kern_empty() {
+  ret void
----------------
We're not really handling spir_kernel anymore, should use amdgpu_kernel

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648