[PATCH] D140242: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 16 12:17:24 PST 2022
arsenm requested changes to this revision.
arsenm added inline comments.
This revision now requires changes to proceed.
================
Comment at: llvm/lib/Analysis/InlineCost.cpp:164-166
+static cl::opt<bool> DisableInlineSimplification(
+ "inline-disable-simplification", cl::Hidden, cl::init(false),
+ cl::desc("Disables instruction simplification during inlining"));
----------------
Not sure why you're adding this but it doesn't belong in this patch
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:86-88
+static cl::opt<unsigned> ArgStackInlinePenalty(
+ "amdgpu-inline-arg-stack-cost", cl::Hidden, cl::init(15),
+ cl::desc("Cost per argument for function arguments passed through stack"));
----------------
Should be able to compute this directly from the existing costs for stack stores
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1189-1191
+ // Outer kernel functions can't be inlined.
+ if (llvm::AMDGPU::isKernelCC(Callee))
+ return 0;
----------------
No reason to specially consider them?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1198
+ if (AMDGPU::isArgPassedInSGPR(&A))
+ SGPRsInUse++;
+ else
----------------
Raw argument counts don't correspond to register counts, need to get the type legalized register size
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1234
}
- if (AllocaSize)
- return ArgAllocaCost;
- return 0;
+ adjustThreshold += adjustInlinigThresholdUsingCallee(CB->getCalledFunction());
+ adjustThreshold += AllocaSize ? ArgAllocaCost : AllocaSize;
----------------
Typo tInlinig
================
Comment at: llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll:18-21
+ %and = and i32 %shr, %shl
+ %shr1 = lshr i32 %y0, %and
+ %shr2 = lshr i32 %shr1, %t0
+ %shl3 = shl i32 %e1, %w0
----------------
Should use pass arguments or flags to set the thresholds to avoid having so many instructions in the test
================
Comment at: llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll:2027
+ %arrayidx = getelementptr inbounds i32, ptr %in, i64 0
+ %0 = load i32, ptr %arrayidx, align 4
+ %arrayidx1 = getelementptr inbounds i32, ptr %in, i64 1
----------------
Don't use anonymous values in tests
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140242/new/
https://reviews.llvm.org/D140242
More information about the llvm-commits
mailing list