[PATCH] D122091: [amdgpu] Elide module lds allocation in kernels with no callees

Mon Mar 21 11:02:06 PDT 2022

JonChesterfield added a comment.

The current behaviour is to allocate an instance of llvm.amdgcn.module.lds.t at address zero in every kernel in the IR module if the global llvm.amdgcn.module.lds is present. That instance only needs to be allocated in kernels which subsequently call a function which uses it.

This patch revises the control flow approximation from 'any kernel can call any function' to 'kernels that make calls can call any function'. That means kernels where everything was inlined don't automatically allocate the module.lds, though presently they'll still allocate it somewhere (not necessarily at zero) if they use a variable that was moved into it. That limitation is solvable by specialising variables with respect to kernels, at which point kernels that make no calls will lower LDS optimally.

Either we allocate this instance in every kernel or for the subset which require it (which this patch approximates better than all, but not as well as it could). If we're going to allocate it for a subset we need to indicate that somehow. I.e. to implement calleeRequiresModuleLDS(). I don't mind if that's an attribute or something else, what would you prefer?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122091/new/

https://reviews.llvm.org/D122091