[PATCH] D94648: [amdgpu] Implement lower function LDS pass
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 4 07:33:09 PST 2021
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+ InlineAsm *IA =
+ InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+ Builder.CreateCall(
----------------
arsenm wrote:
> arsenm wrote:
> > JonChesterfield wrote:
> > > arsenm wrote:
> > > > Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
> > > This would be a hack. I wanted a construct that looks like a use of the instance (and won't be deleted by IR passes and generates minimal code), so that other passes will accurately account for the amount of LDS used by a kernel. Specifically promoteAlloca but I may have missed some.
> > >
> > > An intrinsic that evaporates later would work. I haven't thought of an alternative, will see if a cleaner answer comes to mind.
> > >
> > > (aside: what's bad about r in particular? I'm unfamiliar with our inline assembler, perhaps there's an immediate option instead)
> > Well since the allocation point isn't really fixed yet, whether this size is really correct is questionable. AMDGPUPromoteAlloca currently assumes a worst case placement for padding to avoid going over.
> >
> > r is "pick any register of any class". We have a hard split between VGPRs and SGPRs, so "r" is unpredictable and not very helpful.n
> This is a tricky one. I think checking for a specific intrinsic global variable when allocating the kernel's LDS wouldn't be too bad. However, I did come up with this alternative hack:
>
> ```
> @arst = addrspace(3) global [256 x i32] zeroinitializer
>
> declare void @llvm.donothing()
>
> define void @foo() {
> call void @llvm.donothing() [ "dummyuse"([256 x i32] addrspace(3)* @arst)]
> ret void
> }
>
> ```
>
> This at least solves the PromoteAlloca case. The use disappears for the actual codegen amount so that doesn't quite solve everything. I guess an intrinsic that takes a pointer and returns the same value would be slightly better without changing the existing LDS lowering
Could also avoid this if the kernel already has a direct reference to the LDS, which it likely does
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94648/new/
https://reviews.llvm.org/D94648
More information about the llvm-commits
mailing list