[PATCH] D102401: [AMDGPU] Allocate LDS globals in sorted order of their alignment and size.

Mon May 17 13:16:26 PDT 2021

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:61
+    const auto *I = dyn_cast<Instruction>(UU);
+    assert(I && "Expected an instruction");
+    const Function *F = I->getFunction();
----------------
hsmhsm wrote:
> rampitec wrote:
> > hsmhsm wrote:
> > > rampitec wrote:
> > > > It can probably be Operator.
> > > I did not understand this comment.
> > llvm::Operator.
> llvm::Operator is an abstracrtion over Instruction and ConstExpr. An llvm::Operator should be either an Instruction OR ConstExpr, and hence explicit handling of llvm::Operator is not required.
No, it is a separate subclass of User: https://llvm.org/doxygen/classllvm_1_1User.html

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:115
+  SmallVector<const GlobalVariable *, 8> StaticLDSGlobals =
+      collectStaticLDSGlobals(&M);
+
----------------
hsmhsm wrote:
> rampitec wrote:
> > Maybe just pass a filter lambda there?
> I personally think that it will be over complicated - collectStaticLDSGlobals() is called by both findVariablesToLower() and getSortedUsedLDSGlobals().  Further filtering of LDS globals is required for findVariablesToLower(), but not for getSortedUsedLDSGlobals().
I do not like the idea of allocating memory, then copying.

================
Comment at: llvm/test/CodeGen/AMDGPU/lds-allocation.ll:35
+
+; SDAG-MIR-LABEL: bb.0 (%ir-block.0):
+; SDAG-MIR: [[MOV_1:%[0-9]+:sreg_32]] = S_MOV_B32 0
----------------
The test checking mir is not any better than a test checking instructions. In fact it is worse because it is using unstable mir interface. It still does not check for an actual layout in any transparent way, it checks the offsets. You can check the offsets in the produces ISA.

It does not help that checks are autogenerated, it obscures relevant checks. For that sort of test you need to check the offsets, nothing more. It also depends on an assumption that stores will not be rescheduled and there is no such guarantee in this case. To w/a it you can store different values (e.g. 1, 2, 3, 4 etc), check that in the ISA, and mark stores volatile.

One thing it does not check is that we do not allocate more LDS than used. It does not check the situation with multiple kernels using intersecting LDS variable sets.

I was looking at a test which will really check memory layout, like that is possible with AMDGPULowerModuleLDSPass. In fact I am probably second to Matt we can try to use AMDGPULowerModuleLDSPass for kernels too, not to multiply entities.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102401/new/

https://reviews.llvm.org/D102401