[llvm] [AMDGPU] Increase inline threshold when the callee only has one live use (PR #111311)

Shilei Tian via llvm-commits llvm-commits at lists.llvm.org
Sun Oct 6 13:28:25 PDT 2024


https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/111311

Currently we will not inline a large function even if it only has one live use.
This could significantly impact the performance because CSR spill is very
expensive. The goal of this PR is trying to force the inlining if there is only
one live use by adjusting the inlining threshold, which is a configurable
number. The default value is 15000, which borrows from
`InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number,
and if this is the right way to do that. After making this change, the callee in
my local test case can finally be inlined, but the cost is still very close to
the threshold: `cost=14010, threshold=170775`.

Speaking of the test, how are we gonna test this? Do we want to include a giant
IR file?

Fixes SWDEV-471398.

>From a554afb49a96ac881ec023e96eedfa216fd5ca90 Mon Sep 17 00:00:00 2001
From: Shilei Tian <shilei.tian at amd.com>
Date: Sun, 6 Oct 2024 16:16:47 -0400
Subject: [PATCH] [AMDGPU] Increase inline threshold when the callee only has
 one live use

Currently we will not inline a large function even if it only has one live use.
This could significantly impact the performance because CSR spill is very
expensive. The goal of this PR is trying to force the inlining if there is only
one live use by adjusting the inlining threshold, which is a configurable
number. The default value is 15000, which borrows from
`InlineConstants::LastCallToStaticBonus`. I'm not sure if this is a good number,
and if this is the right way to do that. After making this change, the callee in
my local test case can finally be inlined, but the cost is still very close to
the threshold: `cost=14010, threshold=170775`.

Speaking of the test, how are we gonna test this? Do we want to include a giant
IR file?

Fixes SWDEV-471398.
---
 llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index d348166c2d9a04..debc3db78974ad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -75,6 +75,10 @@ static cl::opt<size_t> InlineMaxBB(
     cl::desc("Maximum number of BBs allowed in a function after inlining"
              " (compile time constraint)"));
 
+static cl::opt<unsigned> InlineThresholdOneLiveUse(
+    "amdgpu-inline-threshold-one-live-use", cl::Hidden, cl::init(15000),
+    cl::desc("Threshold added when the callee only has one live use"));
+
 static bool dependsOnLocalPhi(const Loop *L, const Value *Cond,
                               unsigned Depth = 0) {
   const Instruction *I = dyn_cast<Instruction>(Cond);
@@ -1307,6 +1311,12 @@ unsigned GCNTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
   unsigned AllocaSize = getCallArgsTotalAllocaSize(CB, DL);
   if (AllocaSize > 0)
     Threshold += ArgAllocaCost;
+
+  // Increase the threshold if it is the only call to a local function.
+  Function *Callee = CB->getCalledFunction();
+  if (Callee->hasLocalLinkage() && Callee->hasOneLiveUse())
+    Threshold += InlineThresholdOneLiveUse;
+
   return Threshold;
 }
 



More information about the llvm-commits mailing list