[PATCH] D36849: [AMDGPU] Port of HSAIL inliner
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 28 10:25:58 PDT 2017
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:30-32
+static cl::opt<int>
+HintThreshold("amdgpu-inlinehint-threshold", cl::Hidden, cl::init(12000),
+ cl::desc("Threshold for inlining functions with inline hint"));
----------------
We should avoid having a custom version of the exact same flag that the regular inliner uses. This would result in some surprises.
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:53
+ initializeAMDGPUInlinerPass(*PassRegistry::getPassRegistry());
+ Params = getInlineParams(2000);
+ Params.HintThreshold = HintThreshold;
----------------
Hardcoded threshold. We should rely on the default opt-level based thresholds and override getInliningThresholdMultiplier to get the larger target default. That way the standard flags will work as expected.
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:112
+ const DataLayout &DL = Caller->getParent()->getDataLayout();
+ if (Callee) {
+ // If we have a pointer to private array passed into a function
----------------
Early return if no callee rather than wrapping the rest of the function in if
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:120
+ continue;
+ PtrArg = GetUnderlyingObject(PtrArg, DL);
+ if (const AllocaInst *AI = dyn_cast<AllocaInst>(PtrArg)) {
----------------
Skip potentially costly GetUnderlyingObject call if the address space isn't private
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:123-124
+ Type *Ty = AI->getAllocatedType();
+ if (!Ty->isSized())
+ continue;
+ AllocaSize += DL.getTypeAllocSize(Ty);
----------------
I'm pretty sure you aren't allowed to alloca an unsized type, so this check isn't needed.
However you do need to check / skip dynamically sized allocas.
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:166-174
+ if (!Callee || Callee->isDeclaration() || CS.isNoInline() ||
+ !TTI.areInlineCompatible(Caller, Callee))
+ return llvm::InlineCost::getNever();
+
+ if (CS.hasFnAttr(Attribute::AlwaysInline)) {
+ if (isInlineViable(*Callee))
+ return llvm::InlineCost::getAlways();
----------------
Call the base implementation and see if it says always or never first rather than reproducing the logic?
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:176-177
+
+ if (isWrapperOnlyCall(CS))
+ return llvm::InlineCost::getAlways();
+
----------------
Wrapper calls will (ignoring no inline) always be inlined by the stock inline heuristics, so †his shouldn't be necessary
================
Comment at: lib/Target/AMDGPU/AMDGPUInline.cpp:180-182
+ size_t Size = Caller->size() + Callee->size() - 1;
+ if (MaxBB && Size > MaxBB)
+ return InlineCost::getNever();
----------------
This heuristic doesn't make any sense to me. Why does the block count matter? Just use the default cost.
================
Comment at: test/CodeGen/AMDGPU/amdgpu-inline.ll:75
+}
+
+declare i32 @llvm.amdgcn.workitem.id.x() #1
----------------
Needs tests with multiple private pointers and with the alloca size threshold exceeded
https://reviews.llvm.org/D36849
More information about the llvm-commits
mailing list