[PATCH] D94648: [amdgpu] Implement lower function LDS pass

Fri Jan 15 11:43:22 PST 2021

JonChesterfield added a comment.

Great review guys, thanks! With the exception of avoiding inline asm (which I'd like to, but am not sure what to do about) I'll fix the rest when I'm back in the office.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:66-68
+  static bool
+  useByKernelOrUsedList(const SmallPtrSetImpl<GlobalValue *> &UsedList,
+                        const Use &U, uint64_t Depth = 0) {
----------------
arsenm wrote:
> I think this search isn't quite right and will miss arbitrarily nested constant expressions.
> 
> We already have similar code you need to analyze users in  AMDGPUAnnotateKernelFeatures::visitConstantExprsRecursively to find constant LDS addrspacecasts anywhere they can appear.
The '4' limits the depth of constant expressions analysed, will to change to a something using heap memory. I believe I can assume constantexpr are acyclic.

This test will move variables into the struct unless it can show that is unnecessary, which is safe provided replaceAllUsesWith does the right thing. I may be missing cases on what the user can be.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:69
+                        const Use &U, uint64_t Depth = 0) {
+    // Variables used by kernels/.used are not moved into the new struct
+    User *V = U.getUser();
----------------
madhur13490 wrote:
> Erroneous characters in the comment.
Will rephrase. Those used by kernels and the llvm.used or llvm.compiler.used lists.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:113
+      }
+      if (GV.isConstant()) {
+        // A constant undef variable can't be written to, and any load is
----------------
arsenm wrote:
> Not sure why you need to bother considering this case, if you just treat it normally it should work
It change the constant LDS variable to a mutable one. I don't think that matters much, since the variable is undef initialized, so assumptions about it always having the same value are dubious.

I don't think any of the languages let one create a constant undef value, and if one did, it should have no uses and be erased, but I haven't verified that is the case.

Basically I'm not certain what the right thing to do with a probably-useless constant undef value is so this pass ignores them.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:188
+    InlineAsm *IA =
+        InlineAsm::get(FTy, "// Alloc function lds block", "r", true);
+    Builder.CreateCall(
----------------
arsenm wrote:
> Definitely shouldn't be introducing inline asm, not sure why you are doing this. Also "r" is bad and we shouldn't support it
This would be a hack. I wanted a construct that looks like a use of the instance (and won't be deleted by IR passes and generates minimal code), so that other passes will accurately account for the amount of LDS used by a kernel. Specifically promoteAlloca but I may have missed some.

An intrinsic that evaporates later would work. I haven't thought of an alternative, will see if a cleaner answer comes to mind.

(aside: what's bad about r in particular? I'm unfamiliar with our inline assembler, perhaps there's an immediate option instead)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:202
+  bool runOnModule(Module &M) override {
+    if (DisableLowerFunctionLDS) {
+      return false;
----------------
arsenm wrote:
> Probably should move this flag into the pass pipeline in AMDGPUTargetMachine
Sure, will do.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:233-234
+
+          TypeSize SLHS = DL.getTypeSizeInBits(LHS->getValueType());
+          TypeSize SRHS = DL.getTypeSizeInBits(RHS->getValueType());
+          if (SLHS != SRHS) {
----------------
arsenm wrote:
> Should use the type alloc size. You are also ignoring the alignment of the global itself, which may differ from the type alignment
Didn't see getTypeAllocSize, nice! getAlign is a private function that wraps the verbose getValueOrABITypeAlignment.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerFunctionLDSPass.cpp:279-283
+    // IR considers address 0 to be null, despite amdgpu docs saying -1 is null
+    // This doesn't appear to cause miscompilation. Considered instantiating
+    // {[MaxAlign x i8], %__function_lds_t} instead and indexing from MaxAlign
+    // instead of zero, but it's hard to justify the overhead without evidence
+    // that referencing 'null' is an error in this context.
----------------
arsenm wrote:
> "null" in the IR is just 0. This is only treated as an invalid pointer in address space 0. -1 is used as the invalid pointer value and "null" in addrspace(3) is valid. Ideally this would be a property in the datalayout
I was worried by Constant::getNullValue() returning zero for addrspace(3) but it does indeed seem to work ok. Drop the comment?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94648/new/

https://reviews.llvm.org/D94648