[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 4 07:33:09 PST 2021


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+    InlineAsm *IA =
+        InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+    Builder.CreateCall(
----------------
arsenm wrote:
> arsenm wrote:
> > JonChesterfield wrote:
> > > arsenm wrote:
> > > > Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
> > > This would be a hack. I wanted a construct that looks like a use of the instance (and won't be deleted by IR passes and generates minimal code), so that other passes will accurately account for the amount of LDS used by a kernel. Specifically promoteAlloca but I may have missed some.
> > > 
> > > An intrinsic that evaporates later would work. I haven't thought of an alternative, will see if a cleaner answer comes to mind.
> > > 
> > > (aside: what's bad about r in particular? I'm unfamiliar with our inline assembler, perhaps there's an immediate option instead)
> > Well since the allocation point isn't really fixed yet, whether this size is really correct is questionable. AMDGPUPromoteAlloca currently assumes a worst case placement for padding to avoid going over.
> > 
> > r is "pick any register of any class". We have a hard split between VGPRs and SGPRs, so "r" is unpredictable and not very helpful.n
> This is a tricky one. I think checking for a specific intrinsic global variable when allocating the kernel's LDS wouldn't be too bad. However, I did come up with this alternative hack:
> 
> ```
> @arst = addrspace(3) global [256 x i32] zeroinitializer
> 
> declare void @llvm.donothing()
> 
> define void @foo() {
>   call void @llvm.donothing() [ "dummyuse"([256 x i32] addrspace(3)* @arst)]
>   ret void
> }
> 
> ```
> 
> This at least solves the PromoteAlloca case. The use disappears for the actual codegen amount so that doesn't quite solve everything. I guess an intrinsic that takes a pointer and returns the same value would be slightly better without changing the existing LDS lowering
Could also avoid this if the kernel already has a direct reference to the LDS, which it likely does


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648



More information about the llvm-commits mailing list