[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Jon Chesterfield via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 4 07:36:46 PST 2021


JonChesterfield added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+    InlineAsm *IA =
+        InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+    Builder.CreateCall(
----------------
arsenm wrote:
> arsenm wrote:
> > arsenm wrote:
> > > JonChesterfield wrote:
> > > > arsenm wrote:
> > > > > Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
> > > > This would be a hack. I wanted a construct that looks like a use of the instance (and won't be deleted by IR passes and generates minimal code), so that other passes will accurately account for the amount of LDS used by a kernel. Specifically promoteAlloca but I may have missed some.
> > > > 
> > > > An intrinsic that evaporates later would work. I haven't thought of an alternative, will see if a cleaner answer comes to mind.
> > > > 
> > > > (aside: what's bad about r in particular? I'm unfamiliar with our inline assembler, perhaps there's an immediate option instead)
> > > Well since the allocation point isn't really fixed yet, whether this size is really correct is questionable. AMDGPUPromoteAlloca currently assumes a worst case placement for padding to avoid going over.
> > > 
> > > r is "pick any register of any class". We have a hard split between VGPRs and SGPRs, so "r" is unpredictable and not very helpful.n
> > This is a tricky one. I think checking for a specific intrinsic global variable when allocating the kernel's LDS wouldn't be too bad. However, I did come up with this alternative hack:
> > 
> > ```
> > @arst = addrspace(3) global [256 x i32] zeroinitializer
> > 
> > declare void @llvm.donothing()
> > 
> > define void @foo() {
> >   call void @llvm.donothing() [ "dummyuse"([256 x i32] addrspace(3)* @arst)]
> >   ret void
> > }
> > 
> > ```
> > 
> > This at least solves the PromoteAlloca case. The use disappears for the actual codegen amount so that doesn't quite solve everything. I guess an intrinsic that takes a pointer and returns the same value would be slightly better without changing the existing LDS lowering
> Could also avoid this if the kernel already has a direct reference to the LDS, which it likely does
Sure. We wouldn't gain anything though, specifically no saving in LDS. It also carries the slight risk that the existing direct reference to LDS gets dead code eliminated at an inconvenient time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648



More information about the llvm-commits mailing list