[PATCH] D36849: [AMDGPU] Port of HSAIL inliner

Mon Aug 28 16:37:00 PDT 2017

rampitec added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:30-32
+static cl::opt<int>
+HintThreshold("amdgpu-inlinehint-threshold", cl::Hidden, cl::init(12000),
+              cl::desc("Threshold for inlining functions with inline hint"));
----------------
arsenm wrote:
> We should avoid having a custom version of the exact same flag that the regular inliner uses. This would result in some surprises.
Stock inline hint threshold is only 44% higher than regular threshold. Here I have it 6 times higher. The exact number may and will change, but when I return 9 from getInliningThresholdMultiplier() and multiply hint by the very same number it still will be a very low number, far less than we would really like to have.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:123-124
+        Type *Ty = AI->getAllocatedType();
+        if (!Ty->isSized())
+          continue;
+        AllocaSize += DL.getTypeAllocSize(Ty);
----------------
arsenm wrote:
> I'm pretty sure you aren't allowed to alloca an unsized type, so this check isn't needed.
> 
> However you do need to check / skip dynamically sized allocas.
That was for opaque types which we do not use now, and we had allocas on them.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:166-174
+  if (!Callee || Callee->isDeclaration() || CS.isNoInline() ||
+      !TTI.areInlineCompatible(Caller, Callee))
+    return llvm::InlineCost::getNever();
+
+  if (CS.hasFnAttr(Attribute::AlwaysInline)) {
+    if (isInlineViable(*Callee))
+      return llvm::InlineCost::getAlways();
----------------
arsenm wrote:
> Call the base implementation and see if it says always or never first rather than reproducing the logic?
Base implementation is pure virtual. I could call llvm::getInlineCost() instead directly (it is called from SimpleInliner::getInlineCost() anyway) two times, but I want to avoid expensive CallAnalyzer.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:176-177
+
+  if (isWrapperOnlyCall(CS))
+    return llvm::InlineCost::getAlways();
+
----------------
arsenm wrote:
> Wrapper calls will (ignoring no inline) always be inlined by the stock inline heuristics, so †his shouldn't be necessary
That is unless we hit MaxBB, in which case we still want to inline it.

================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:180-182
+  size_t Size = Caller->size() + Callee->size() - 1;
+  if (MaxBB && Size > MaxBB)
+    return InlineCost::getNever();
----------------
arsenm wrote:
> This heuristic doesn't make any sense to me. Why does the block count matter? Just use the default cost.
That is to prevent huge compilation time of some programs. Not an ideal heuristic, but better than nothing.

https://reviews.llvm.org/D36849