[llvm] [SystemZ] Fix compile time regression in adjustInliningThreshold(). (PR #137527)

Jonas Paulsson via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 27 09:25:35 PDT 2025


https://github.com/JonPsson1 created https://github.com/llvm/llvm-project/pull/137527

Instead of always iterating over all GlobalVariable:s in the Module to find the case where both Caller and Callee is using the same GV heavily, first scan Callee (only if less than 200 instructions) for all GVs used more than 10 times, and then do the counting for the Caller for just those relevant GVs.

The limit of 200 instructions makes sense as this aims to inline a relatively small function using a GV +10 times. This limit changed only 7 files across 3 SPEC benchmarks . Previously only perlbench performance was affected, and perl is not among these 3 changed benchmarks, so there should not be any difference to consider here. SPEC runs seem to confirm this ("full/home-dir").

Compile time across SPEC shows no difference compared to main. It however resolves the compile time problem with zig where it is on main (compared to removing the heuristic) a 380% increase, but with this change only 2.4% increase (total user compile time with opt).

Fixes #134714.

>From 65bc5bffb350beb26970d6664328f2fa5fa3d53a Mon Sep 17 00:00:00 2001
From: Jonas Paulsson <paulson1 at linux.ibm.com>
Date: Sun, 27 Apr 2025 11:50:57 +0200
Subject: [PATCH] IP

---
 .../SystemZ/SystemZTargetTransformInfo.cpp    | 50 +++++++++++--------
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index ee142ccd20e20..78f5154229f55 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -80,7 +80,6 @@ unsigned SystemZTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
   const Function *Callee = CB->getCalledFunction();
   if (!Callee)
     return 0;
-  const Module *M = Caller->getParent();
 
   // Increase the threshold if an incoming argument is used only as a memcpy
   // source.
@@ -92,29 +91,38 @@ unsigned SystemZTTIImpl::adjustInliningThreshold(const CallBase *CB) const {
     }
   }
 
-  // Give bonus for globals used much in both caller and callee.
-  std::set<const GlobalVariable *> CalleeGlobals;
-  std::set<const GlobalVariable *> CallerGlobals;
-  for (const GlobalVariable &Global : M->globals())
-    for (const User *U : Global.users())
-      if (const Instruction *User = dyn_cast<Instruction>(U)) {
-        if (User->getParent()->getParent() == Callee)
-          CalleeGlobals.insert(&Global);
-        if (User->getParent()->getParent() == Caller)
-          CallerGlobals.insert(&Global);
+  // Give bonus for globals used much in both caller and a relatively small
+  // callee.
+  if (Callee->getInstructionCount() < 200) {
+    std::map<const Value *, unsigned> Ptr2NumUses;
+    for (auto &BB : *Callee)
+      for (auto &I : BB) {
+        if (const auto *SI = dyn_cast<StoreInst>(&I)) {
+          if (!SI->isVolatile())
+            Ptr2NumUses[SI->getPointerOperand()]++;
+        } else if (const auto *LI = dyn_cast<LoadInst>(&I)) {
+          if (!LI->isVolatile())
+            Ptr2NumUses[LI->getPointerOperand()]++;
+        } else if (const auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
+          unsigned NumStores = 0, NumLoads = 0;
+          countNumMemAccesses(GEP, NumStores, NumLoads, Callee);
+          Ptr2NumUses[GEP->getPointerOperand()] += NumLoads + NumStores;
+        }
       }
-  for (auto *GV : CalleeGlobals)
-    if (CallerGlobals.count(GV)) {
-      unsigned CalleeStores = 0, CalleeLoads = 0;
-      unsigned CallerStores = 0, CallerLoads = 0;
-      countNumMemAccesses(GV, CalleeStores, CalleeLoads, Callee);
-      countNumMemAccesses(GV, CallerStores, CallerLoads, Caller);
-      if ((CalleeStores + CalleeLoads) > 10 &&
-          (CallerStores + CallerLoads) > 10) {
-        Bonus = 1000;
-        break;
+
+    for (auto I : Ptr2NumUses) {
+      const Value *Ptr = I.first;
+      unsigned NumCalleeUses = I.second;
+      if (NumCalleeUses > 10 && isa<GlobalVariable>(Ptr)) {
+        unsigned CallerStores = 0, CallerLoads = 0;
+        countNumMemAccesses(Ptr, CallerStores, CallerLoads, Caller);
+        if (CallerStores + CallerLoads > 10) {
+          Bonus = 1000;
+          break;
+        }
       }
     }
+  }
 
   // Give bonus when Callee accesses an Alloca of Caller heavily.
   unsigned NumStores = 0;



More information about the llvm-commits mailing list