[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Jon Chesterfield via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 13 12:21:28 PDT 2021


JonChesterfield added a comment.

Nice reference to GRANULATED_LDS_SIZE, thanks!

We unconditionally allocate the module variable from lowerFormalArgumentsKernel, which still looks right to me. My current theory is there's some hook between that and the metadata writer that needs to be poked from the above code and isn't, but I haven't worked through the metadata setup code yet.

Looking back I see

> The use disappears for the actual codegen amount so that doesn't quite solve everything

which correlates strongly with this bug, though I didn't make the connection at the time.

Inline asm does keep the use alive long enough to reach the metadata in the binary. An intrinsic would doubtless achieve the same if it was eliminated late enough. Need to find out what late enough is to see how much plumbing that requires.

Worth noting given the recent discussions about LDS usage that this patch puts the module variable in every kernel. If the allocation was pinned to the presence of the intrinsic, or if there was an attribute for no-module-lds-needed-in-this-kernel, that could be eliminated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648



More information about the llvm-commits mailing list