[PATCH] D102401: [AMDGPU] Allocate LDS globals in sorted order of their alignment and size.

Tue May 18 11:10:00 PDT 2021

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:61
+    const auto *I = dyn_cast<Instruction>(UU);
+    assert(I && "Expected an instruction");
+    const Function *F = I->getFunction();
----------------
hsmhsm wrote:
> rampitec wrote:
> > hsmhsm wrote:
> > > rampitec wrote:
> > > > hsmhsm wrote:
> > > > > rampitec wrote:
> > > > > > It can probably be Operator.
> > > > > I did not understand this comment.
> > > > llvm::Operator.
> > > llvm::Operator is an abstracrtion over Instruction and ConstExpr. An llvm::Operator should be either an Instruction OR ConstExpr, and hence explicit handling of llvm::Operator is not required.
> > No, it is a separate subclass of User: https://llvm.org/doxygen/classllvm_1_1User.html
> But, if I look into the implemenation of Operator at - https://llvm.org/doxygen/Operator_8h_source.html it clearly tells me that an instance of Operator class will never ever be created, and it is really a place holder for Instrunction and ConstExpr.
> 
> ```
>    /// Return the opcode for this Instruction or ConstantExpr.
>    unsigned getOpcode() const {
>      if (const Instruction *I = dyn_cast<Instruction>(this))
>        return I->getOpcode();
>      return cast<ConstantExpr>(this)->getOpcode();
>    }
> ```
Example:

```
%sunkaddr4 = getelementptr inbounds i8, i8* bitcast ({ %union, [2000 x i8] }* @global_pointer to i8*), i64 %sunkaddr3
```

================
Comment at: llvm/test/CodeGen/AMDGPU/lds-allocation.ll:35
+
+; SDAG-MIR-LABEL: bb.0 (%ir-block.0):
+; SDAG-MIR: [[MOV_1:%[0-9]+:sreg_32]] = S_MOV_B32 0
----------------
hsmhsm wrote:
> rampitec wrote:
> > The test checking mir is not any better than a test checking instructions. In fact it is worse because it is using unstable mir interface. It still does not check for an actual layout in any transparent way, it checks the offsets. You can check the offsets in the produces ISA.
> > 
> > It does not help that checks are autogenerated, it obscures relevant checks. For that sort of test you need to check the offsets, nothing more. It also depends on an assumption that stores will not be rescheduled and there is no such guarantee in this case. To w/a it you can store different values (e.g. 1, 2, 3, 4 etc), check that in the ISA, and mark stores volatile.
> > 
> > One thing it does not check is that we do not allocate more LDS than used. It does not check the situation with multiple kernels using intersecting LDS variable sets.
> > 
> > I was looking at a test which will really check memory layout, like that is possible with AMDGPULowerModuleLDSPass. In fact I am probably second to Matt we can try to use AMDGPULowerModuleLDSPass for kernels too, not to multiply entities.
> I think, all of you are kind of suggesting to extend AMDGPULowerModuleLDSPass. But, I am not sure, how it is easily extendible, moreever, it is being invoked before promoteAlloca pass at the moment. We really need to discuss it and come to a common conclusion, before proceeding further.
> 
> For me, it looks like - there is a ball, half of which is painted white, and other half black, few are looking at white part, and others looking at black part, and not able to come to a common conclusion.
PromoteAlloca is probably not an issue, it will just add a new allocation past that struct. It should be already properly aligned as we are creating it ourselves. The current waste of LDS by the AMDGPULowerModuleLDSPass is an issue though. Ideally it would be nice to come up with a single solution for functions and kernels which I guess we are continue discussing...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102401/new/

https://reviews.llvm.org/D102401