[PATCH] D101310: [AMDGPU] Replace uses of LDS globals within non-kernel functions by pointers.

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 10 14:46:24 PDT 2021


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:270-272
+// Before runnning current LDS lower pass, replace LDS uses within non-kernel
+// functions by pointers so that the current pass minimizes the unnecessary per
+// kernel allocation of LDS memory.
----------------
This is a weird place for pass documentation, just put this in the file header.

The wording is also weird. Current pass doesn't mean anything in this context


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:12
+// LDS globals into a struct type, and creates an instance of that struct type
+// within every kernel at "address zero".
+//
----------------
No quotes around address zero?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:17
+//
+// Hence the current pass makes an attempt to aid the pass - "Lower Module LDS"
+// for efficient LDS memory usage. The idea behind current pass is as follows:
----------------
s/current pass/this pass/


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:26
+// * Then the pass "Lower Module LDS" by the virtue of its implementation idea,
+//   lands-up packing only LDS pointers as struct members, which substentially
+//   reduces unnecessary LDS memory usage.
----------------
Typo substentially


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:28
+//   reduces unnecessary LDS memory usage.
+//
+//===----------------------------------------------------------------------===//
----------------
Add a pseudocode example?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:55-59
+enum ReplaceLDSErrorKind : uint32_t {
+  LLEK_EndOfList = 0u,
+  LLEK_InternalError = 2u,
+  LLEK_NoCalleeDefinitionError = 3u
+};
----------------
Don't see much point in this. If it's worth reporting a specific error code, it's worth adding a DiagnosticInfo type for it (but I don't think it is)


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:69-72
+  case LLEK_InternalError: {
+    ErrStr = ErrStr + std::string("has encountered an internal error.");
+    break;
+  }
----------------
Shouldn't just have a generic error


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:74-80
+    ErrStr =
+        ErrStr +
+        std::string("assumes that the definitions of both caller and callee "
+                    "appear within same module. But, definition for the "
+                    "callee \"") +
+        V->getName().str() + std::string("\" not available.");
+    break;
----------------
I think you shouldn't error here, and just handle these assuming the external call could access the variable, wherever it may end up


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:84
+
+  report_fatal_error(ErrStr);
+}
----------------
hsmhsm wrote:
> JonChesterfield wrote:
> > This is an optimisation. Instead of a fatal error, it can always leave the variable unchanged. Therefore it should not abort compilation.
> call to reportReplaceLDSError() is being made, when the pass cannot proceed further for one or the other reason (internal error situation). And, it is perfectly valid for any optimiation pass to abonden and report an error when it faces some internal error kind of situation. 
This should also go through DiagnosticInfo rather than report_fatal_error


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:88-91
+template <typename R, typename E>
+static bool contains(R &&VMap, const E &Element) {
+  return VMap.find(Element) != VMap.end();
+}
----------------
There's already an is_contained


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:105
+// operands, convert those const expressions into corresponding instructions.
+static void getInstructions(Instruction *I, std::set<Value *> &Operands,
+                            std::set<Instruction *> &Insts) {
----------------
getInstructions is overly generic. How about convertConstExprsToInstructions?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:125-126
+
+// Check if const exprssion `CE2` holds const expression `CE`, and return true
+// if `CE` exist within `CE2`.
+static bool isCEExist(ConstantExpr *CE, ConstantExpr *CE2) {
----------------
It feels wrong that you would need to explicitly test this, and this case would be naturally handled by your walk through the constantexpr


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:135
+    if (auto *CE3 = dyn_cast<ConstantExpr>(UU.get()))
+      CEExist = CEExist | isCEExist(CE, CE3);
+  }
----------------
I'd prefer to avoid recursion to analyze constantxeprs


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:215
+  // Holds all kernels defined within the module `M`.
+  std::vector<Function *> Kernels;
+
----------------
Don't see why you need to store this, can just loop through and handle each kernel as it appears


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:222
+  // Associates LDS global to a unique pointer which points to that LDS global.
+  std::map<GlobalVariable *, GlobalVariable *> LDSToPointer;
+
----------------
ValueMap? You do have value replacements occuring


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:226
+  // corresponding LDS globals within K.
+  std::map<Function *, std::set<GlobalVariable *>> KernelToLDSPointers;
+
----------------
DenseMap?


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:300-301
+  // created.
+  if (!contains(FunctionToLDSToInst, F))
+    FunctionToLDSToInst[F] = std::map<GlobalVariable *, Value *>();
+  FunctionToLDSToInst[F][LDS] = V;
----------------
This is just what happens on first map access anyway


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:334-335
+          // (including U) to corresponding instructions.
+          auto *CE = dyn_cast<ConstantExpr>(U);
+          assert(CE && "Expected constant expression.");
+          auto CEOperands = getCEOperands(I, CE);
----------------
cast instead of assert on dyn_cast


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:363
+  IRBuilder<> Builder(EI);
+  Builder.CreateStore(Builder.CreatePtrToInt(LDS, Type::getInt16Ty(Ctx)),
+                      LDSPointer);
----------------
Should not be using ptrtoint, keep everything as i16 and use GEP to index off the base


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:436
+                                  std::set<Function *> &LDSAccessors) {
+  // Ignore `LDS` if its size is too small. Current threshold is 8 bytes.
+  if (AMDGPU::getLDSGlobalSizeInBytes(M, LDS) <= 8)
----------------
arsenm wrote:
> I don't understand why there would be a size threshold here
"Current threshold" part unnecessary


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:436-438
+  // Ignore `LDS` if its size is too small. Current threshold is 8 bytes.
+  if (AMDGPU::getLDSGlobalSizeInBytes(M, LDS) <= 8)
+    return true;
----------------
I don't understand why there would be a size threshold here


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp:442-445
+  if (LDSAccessors.empty())
+    return true;
+
+  return false;
----------------
Directly return .empty()?


================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:137-140
+  auto *Ty = LDS->getValueType();
+  auto SizeInBits = M.getDataLayout().getTypeSizeInBits(Ty).getFixedSize();
+  auto SizeInBytes = SizeInBits / 8;
+  return SizeInBytes;
----------------
This function shouldn't exist, just use DL.getTypeAllocSize


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101310/new/

https://reviews.llvm.org/D101310



More information about the llvm-commits mailing list