[PATCH] D144221: [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD

Thu Feb 23 06:25:20 PST 2023

JonChesterfield planned changes to this revision.
JonChesterfield added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:124
+  auto Type = M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
+  GV->setMetadata(LLVMContext::MD_absolute_symbol,
+                  MDNode::get(Ctx, ConstantAsMetadata::get(
----------------
arsenm wrote:
> JonChesterfield wrote:
> > arsenm wrote:
> > > Changing the IR in a codegen pass is bad. Can you set this earlier?
> > Setting metadata on the IR during codegen is not pretty but it's also inconsequential. The frame calculation this deletes is also quite ugly though.
> > 
> > For the existing cases we can set the metadata in the IR lowering (carefully, but accurately enough). It'll make changes to promote alloca fragile. That won't work for external/dynamic LDS as we don't know that address until the end of ISel.
> > 
> > So yes... if we want a hard no on setting metadata to record information learned during codegen, we can avoid it. Basically by lifting the remaining LDS lowering currently done by codegen wholly out of codegen. That'll change how dynamic LDS is handled from kernels which I was hoping to avoid.
> > 
> > If we're up for tagging variables with metadata at this point we can write the address chosen for dynamic LDS variables the same way and the lowerConstant path added here will do the right thing for them.
> So you're implying the promote alloca introduced LDS still has possibly unpredictable offsets assigned for it?
Yep. It's a phase ordering problem.

promoteAlloca needs to know how much LDS is already allocated to estimate how much more it can use. Some of that might be allocated for non-kernel functions, so LowerModuleLDS runs before promoteAlloca to make those allocations visible to it.

LowerModuleLDS thus doesn't know how much (if any) extra will be introduced by promoteAlloca.At least, it doesn't at present.

Then the lowered frame looks like:
{
  managed by lowermodulelds
  introduced by promotealloca (and maybe other things? none I know of)
  maybe alignment
  dynamic lds goes here
}

Also if promotealloca introduces more than one lds variable (haven't checked, seems reasonable that it might) and introduces them as independent values with different alignment, then we're back to allocate-in-dag-traversal order, in which case the address of dynamic lds is unstable.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h:120
-  static bool isKnownAddressLDSGlobal(const GlobalVariable &GV);
-  static unsigned calculateKnownAddressOfLDSGlobal(const GlobalVariable &GV);

----------------
arsenm wrote:
> JonChesterfield wrote:
> > JonChesterfield wrote:
> > > Note to self - this shouldn't be dead after this patch - find out what happened to the lowering path for the currently unused "direct" scheme
> > I think a block was lost in a git merge - AMDGPUIselLowering should have a block in it like:
> > 
> > ```
> >  if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
> >    if (!MFI->isModuleEntryFunction()) {
> >      if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV)) {
> >        if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(*GVar)) {
> >          unsigned Offset =
> >              AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(*GVar);
> >          return DAG.getConstant(Offset, SDLoc(Op), Op.getValueType());
> >        }
> >      }
> >    }
> >  }
> > ```
> > 
> > that allows direct access to kernel allocated variables from functions where the access is unambiguous. This highlights missing test coverage for the "kernel" lowering strategy (from IR to ISA). Bug is latent but will complicate getting rid of the calculateKnownAddressOfLDSGlobal function.
> So if you set the absolute_address in the IR pass, and LowerGlobalAddress respected it, how would that be less reliable?
Would work identically for the module/kernel specific structures as they are deterministic.

Actually somewhat prettier to set those via metadata as it avoids the problem of codegen a function before the corresponding kernel. The above ^ would change to look at metadata on the variables.

That will unwind a little if something after lowermodulelds decides to change those structs, but that's manageable. 

Even if we can't mutate the IR during codegen for dynamic lds, I think that's an objective improvement to the current lowering scheme involving repeated frame calculations.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144221/new/

https://reviews.llvm.org/D144221