[PATCH] D91516: [AMDGPU][WIP] Lower LDS Global Variables.

Fri Feb 12 08:36:17 PST 2021

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:311
+  // FIXME: Looks like `Instruction` is not mutatable? hence requiring to clone.
+  //
+  // Create clone of `I`, say, it is `NewI`. Within `NewI`, replace the use(s)
----------------
You should only need to do the use replacement, you aren't changing the types of the instructions so cloning/hacking on them shouldn't be needed

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:446-448
+    assert(BI != K->getEntryBlock().end() &&
+           "Entry basic block of the kernel cannot be empty, otherwise control "
+           "would not reach this point\n");
----------------
Don't need this, the IR would have failed the verifier to get here

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:467-468
+    auto *GEP2 = GetElementPtrInst::CreateInBounds(
+        LDSLayouts->getValueType(), const_cast<GlobalVariable *>(LDSLayouts),
+        Indices2, PrefixStr + Twine("gep."), const_cast<Instruction *>(&EI));
+
----------------
Should not const_cast

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:517-518
+    for (unsigned L = 0; L < LDSGlobals.size(); ++L) {
+      auto *LDS = IDToLDS[L];
+      auto Offset = contains(KernelToLDSToOffset, Kernel)
+                        ? contains(KernelToLDSToOffset[Kernel], LDS)
----------------
Too much auto for me

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:553
+      M, ArrTy, false, GlobalValue::InternalLinkage, UndefValue::get(ArrTy),
+      Twine("__AMDGPUUnifiedLDSLayouts__"), nullptr,
+      GlobalVariable::NotThreadLocal, AMDGPUAS::CONSTANT_ADDRESS);
----------------
Should use lowercase, period separator naming convention with an llvm.amdgcn prefix

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:565
+  auto LayoutName =
+      Twine("__AMDGPUUnifiedLDSLayout") + Twine(KernelToID[K]) + Twine("__");
+  auto *LayoutType = ArrayType::get(IntegerType::get(Ctx, 8), LayoutSize);
----------------
Should use lowercase, period separator naming convention with an llvm.amdgcn prefix

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:608
+    LDSToOffSet[LDS] = CurOffset;
+    CurOffset += DL.getTypeAllocSize(LDS->getValueType()).getFixedValue();
+  }
----------------
This still needs to add in alignment padding 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:668
+  for (auto KI = KernelToCallees.begin(), KE = KernelToCallees.end(); KI != KE;
+       ++KI)
+    for (auto *Callee : KI->second)
----------------
Braces, Also can use range loop

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:742
+         "null");
+  for (auto GI = CGNode->begin(), GE = CGNode->end(); GI != GE; ++GI) {
+    auto *CGN = GI->second;
----------------
I don't see why you need to build your own stack. The call graph already found the reachable functions for you

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:776
+    SmallPtrSetImpl<Function *> &CalleeSet) {
+  if (auto *MD = CB->getMetadata(LLVMContext::MD_callees)) {
+    // The metadata "!callee" is available at the indirect call site `CB`, which
----------------
I think trying to handle callees is left for a later patch. Additionally, I think this should be the CallGraph analysis's responsibility to deal with

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:854-857
+// Traverse `CallGraph` starting from the `CallGraphNode` associated with each
+// kernel `K` in DFS manner and collect all the callees which are reachable from
+// K (including indirectly called callees).
+void LowerLDSGlobalImpl::createKernelToCalleesMap() {
----------------
The callgraph should already give this to you. Iterating the call graph should give you all of the functions you care about. You don't actually need to worry about which functions call which, since you need to touch every function in the SCC

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerLDSGlobal.cpp:986-987
+  for (auto &F : M.functions()) {
+    if (AMDGPU::isModuleEntryFunctionCC(F.getCallingConv()) &&
+        !F.isDeclaration())
+      Kernels.insert(&F);
----------------
I would swap the order of these checks

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91516/new/

https://reviews.llvm.org/D91516