[PATCH] D155240: [amdgpu][lds] Use amdgpu-lds-size instead of llvm.donothing

Thu Jul 13 16:13:55 PDT 2023

JonChesterfield added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1064

+          // Could replace this with a dynamic LDS alignment attribute
           markUsedByKernel(Builder, func, N);
----------------
arsenm wrote:
> I thought it was weird only having a size attribute. But also, how does this help if the size is always rounded by the maximum dynamic alignment?
Two things.

The promote alloca pass skips kernels that use dynamic lds. It does that by crawling globals. This use ensures it notices when kernels only use dynamic lds in some called function. An align attribute would give the same information without the search.

If something wants to append to the static frame, repeatedly, tracking dynamic alignment separately (aka alignment of the end of the frame) seems sensible.

I haven't totally thought this one through yet, it might suffice to have a bool do-not-append-more-lds per kernel.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp:955
+
+  if (CurrentLocalMemUsage == 0) {
+    // Sort to try to estimate the worst case alignment padding
----------------
arsenm wrote:
> arsenm wrote:
> > Is there now a path to introducing LDS outside of a kernel?
> > 
> > I'm slightly nervous about assuming amdgpu-lds-size is a source of truth in case something else introduces new LDS globals. Such as this pass, which is not updating the value 
> If it already pre-packed the globals into a single big one, won't the search find the case of 1 and work fine without this?
PromoteAlloca sometimes adds a global accessed by a kernel. It could be extended to work on non-entry functions with some effort. That's currently the only thing downstream of lowermodulelds that might add a global.

amdgpu-frame-size is a reasonable contender for a source of truth. We can append to it and mark the new global with an absolute address to get something the backend handles correctly. The awkward case is dynamic lds, which currently needs to block appending.

I think the lowering pass could be reworked to compose cleanly based on that attribute and append-only model

I'm thinking of moving graphics to the same lowering as compute and then making any lds variable without an absolute address assigned before codegen a fatal error.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155240/new/

https://reviews.llvm.org/D155240