[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Wed Feb 10 09:11:13 PST 2021

JonChesterfield added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:187
+
+  static void markUsedByKernel(IRBuilder<> &Builder, Function *Func,
+                               GlobalVariable *SGV) {
----------------
arsenm wrote:
> JonChesterfield wrote:
> > I quite like the donothing alternative to inline asm. It does indeed keep the use alive long enough.
> > 
> > A future change to the pipeline might break that, but it'll do so fairly obviously (all the openmp stuff stops working, for one). I think we go with annotated donothing for now, and implement an intrinsic -> pseudo sequence when/if it becomes necessary. Written a fairly long comment to that effect in the source.
> But if there are no pre-existing uses of the LDS in the kernel, this won't end up getting allocated in the kernel
If all uses of LDS are from a kernel, this pass does nothing. Otherwise:
- every kernel gets a call to llvm.donothing (previously inline asm) that looks like a use of the per-module struct
- every kernel allocates the size of the per-module struct, regardless of whether the llvm.donothing is present or not
See the constructor AMDGPUMachineFunction::AMDGPUMachineFunction. If the symbol llvm.amdgcn.module.lds is present, allocateLDSGlobal is called on it, before any other calls to allocateLDSGlobal in order to reliably guess that the offset returned will be zero.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648