[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Fri Jan 15 11:53:17 PST 2021

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:66-68
+  static bool
+  useByKernelOrUsedList(const SmallPtrSetImpl<GlobalValue *> &UsedList,
+                        const Use &U, uint64_t Depth = 0) {
----------------
JonChesterfield wrote:
> arsenm wrote:
> > I think this search isn't quite right and will miss arbitrarily nested constant expressions.
> > 
> > We already have similar code you need to analyze users in  AMDGPUAnnotateKernelFeatures::visitConstantExprsRecursively to find constant LDS addrspacecasts anywhere they can appear.
> The '4' limits the depth of constant expressions analysed, will to change to a something using heap memory. I believe I can assume constantexpr are acyclic.
> 
> This test will move variables into the struct unless it can show that is unnecessary, which is safe provided replaceAllUsesWith does the right thing. I may be missing cases on what the user can be.
Probably should just use the same recursive search, even better if they are sharable

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+    InlineAsm *IA =
+        InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+    Builder.CreateCall(
----------------
JonChesterfield wrote:
> arsenm wrote:
> > Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
> This would be a hack. I wanted a construct that looks like a use of the instance (and won't be deleted by IR passes and generates minimal code), so that other passes will accurately account for the amount of LDS used by a kernel. Specifically promoteAlloca but I may have missed some.
> 
> An intrinsic that evaporates later would work. I haven't thought of an alternative, will see if a cleaner answer comes to mind.
> 
> (aside: what's bad about r in particular? I'm unfamiliar with our inline assembler, perhaps there's an immediate option instead)
Well since the allocation point isn't really fixed yet, whether this size is really correct is questionable. AMDGPUPromoteAlloca currently assumes a worst case placement for padding to avoid going over.

r is "pick any register of any class". We have a hard split between VGPRs and SGPRs, so "r" is unpredictable and not very helpful.n

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:279-283
+    // IR considers address 0 to be null, despite amdgpu docs saying -1 is null
+    // This doesn't appear to cause miscompilation. Considered instantiating
+    // {[MaxAlign x i8], %__function_lds_t} instead and indexing from MaxAlign
+    // instead of zero, but it's hard to justify the overhead without evidence
+    // that referencing 'null' is an error in this context.
----------------
JonChesterfield wrote:
> arsenm wrote:
> > "null" in the IR is just 0. This is only treated as an invalid pointer in address space 0. -1 is used as the invalid pointer value and "null" in addrspace(3) is valid. Ideally this would be a property in the datalayout
> I was worried by Constant::getNullValue() returning zero for addrspace(3) but it does indeed seem to work ok. Drop the comment?
Yes

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648