[PATCH] D36849: [AMDGPU] Port of HSAIL inliner

Mon Aug 28 10:25:58 PDT 2017

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:30-32
+static cl::opt<int>
+HintThreshold("amdgpu-inlinehint-threshold", cl::Hidden, cl::init(12000),
+              cl::desc("Threshold for inlining functions with inline hint"));
----------------
We should avoid having a custom version of the exact same flag that the regular inliner uses. This would result in some surprises.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:53
+    initializeAMDGPUInlinerPass(*PassRegistry::getPassRegistry());
+    Params = getInlineParams(2000);
+    Params.HintThreshold = HintThreshold;
----------------
Hardcoded threshold. We should rely on the default opt-level based thresholds and override getInliningThresholdMultiplier to get the larger target default. That way the standard flags will work as expected.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:112
+  const DataLayout &DL = Caller->getParent()->getDataLayout();
+  if (Callee) {
+    // If we have a pointer to private array passed into a function
----------------
Early return if no callee rather than wrapping the rest of the function in if

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:120
+        continue;
+      PtrArg = GetUnderlyingObject(PtrArg, DL);
+      if (const AllocaInst *AI = dyn_cast<AllocaInst>(PtrArg)) {
----------------
Skip potentially costly GetUnderlyingObject call if the address space isn't private 

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:123-124
+        Type *Ty = AI->getAllocatedType();
+        if (!Ty->isSized())
+          continue;
+        AllocaSize += DL.getTypeAllocSize(Ty);
----------------
I'm pretty sure you aren't allowed to alloca an unsized type, so this check isn't needed.

However you do need to check / skip dynamically sized allocas.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:166-174
+  if (!Callee || Callee->isDeclaration() || CS.isNoInline() ||
+      !TTI.areInlineCompatible(Caller, Callee))
+    return llvm::InlineCost::getNever();
+
+  if (CS.hasFnAttr(Attribute::AlwaysInline)) {
+    if (isInlineViable(*Callee))
+      return llvm::InlineCost::getAlways();
----------------
Call the base implementation and see if it says always or never first rather than reproducing the logic?

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:176-177
+
+  if (isWrapperOnlyCall(CS))
+    return llvm::InlineCost::getAlways();
+
----------------
Wrapper calls will (ignoring no inline) always be inlined by the stock inline heuristics, so †his shouldn't be necessary

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:180-182
+  size_t Size = Caller->size() + Callee->size() - 1;
+  if (MaxBB && Size > MaxBB)
+    return InlineCost::getNever();
----------------
This heuristic doesn't make any sense to me. Why does the block count matter? Just use the default cost.

================
Comment at: test/CodeGen/AMDGPU/amdgpu-inline.ll:75
+}
+
+declare i32 @llvm.amdgcn.workitem.id.x() #1
----------------
Needs tests with multiple private pointers and with the alloca size threshold exceeded

https://reviews.llvm.org/D36849