[llvm] a419003 - [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

Sun May 8 22:23:40 PDT 2022

Author: Hongtao Yu
Date: 2022-05-08T22:07:58-07:00
New Revision: a4190037fac06c2b0cc71b8bb90de9e6b570ebb5

URL: https://github.com/llvm/llvm-project/commit/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5
DIFF: https://github.com/llvm/llvm-project/commit/a4190037fac06c2b0cc71b8bb90de9e6b570ebb5.diff

LOG: [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues:

1. While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical.

2. The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity.

3. Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites.

In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues.

I have seen good results with a universal default setup for two of our internal services.

For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size.
For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D125023

Added: 
    

Modified: 
    llvm/tools/llvm-profgen/CSPreInliner.cpp
    llvm/tools/llvm-profgen/CSPreInliner.h
    llvm/tools/llvm-profgen/ProfileGenerator.cpp
    llvm/tools/llvm-profgen/ProfileGenerator.h

Removed: 
    


################################################################################
diff  --git a/llvm/tools/llvm-profgen/CSPreInliner.cpp b/llvm/tools/llvm-profgen/CSPreInliner.cpp
index b561c549e728..7634ab5deed5 100644

--- a/llvm/tools/llvm-profgen/CSPreInliner.cpp
+++ b/llvm/tools/llvm-profgen/CSPreInliner.cpp
@@ -56,13 +56,13 @@ static cl::opt<bool> SamplePreInlineReplay(
         "Replay previous inlining and adjust context profile accordingly"));
 
 CSPreInliner::CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,
-                           uint64_t HotThreshold, uint64_t ColdThreshold)
+                           ProfileSummary *Summary)
     : UseContextCost(UseContextCostForPreInliner),
       // TODO: Pass in a guid-to-name map in order for
       // ContextTracker.getFuncNameFor to work, if `Profiles` can have md5 codes
       // as their profile context.
       ContextTracker(Profiles, nullptr), ProfileMap(Profiles), Binary(Binary),
-      HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {
+      Summary(Summary) {
   // Set default preinliner hot/cold call site threshold tuned with CSSPGO.
   // for good performance with reasonable profile size.
   if (!SampleHotCallSiteThreshold.getNumOccurrences())
@@ -152,16 +152,32 @@ bool CSPreInliner::shouldInline(ProfiledInlineCandidate &Candidate) {
     return Candidate.CalleeSamples->getContext().hasAttribute(
         ContextWasInlined);
 
-  // Adjust threshold based on call site hotness, only do this for callsite
-  // prioritized inliner because otherwise cost-benefit check is done earlier.
   unsigned int SampleThreshold = SampleColdCallSiteThreshold;
-  if (Candidate.CallsiteCount > HotCountThreshold)
-    SampleThreshold = SampleHotCallSiteThreshold;
+  uint64_t ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
+      (Summary->getDetailedSummary()));
 
-  // TODO: for small cold functions, we may inlined them and we need to keep
-  // context profile accordingly.
-  if (Candidate.CallsiteCount < ColdCountThreshold)
+  if (Candidate.CallsiteCount <= ColdCountThreshold)
     SampleThreshold = SampleColdCallSiteThreshold;
+  else {
+    // Linearly adjust threshold based on normalized hotness, i.e, a value in
+    // [0,1]. Use 10% cutoff instead of the max count as the normalization
+    // upperbound for stability.
+    double NormalizationUpperBound =
+        ProfileSummaryBuilder::getEntryForPercentile(
+            Summary->getDetailedSummary(), 100000 /* 10% */)
+            .MinCount;
+    double NormalizationLowerBound = ColdCountThreshold;
+    double NormalizedHotness =
+        (Candidate.CallsiteCount - NormalizationLowerBound) /
+        (NormalizationUpperBound - NormalizationLowerBound);
+    if (NormalizedHotness > 1.0)
+      NormalizedHotness = 1.0;
+    // Add 1 to to ensure hot callsites get a non-zero threshold, which could
+    // happen when SampleColdCallSiteThreshold is 0. This is when we do not
+    // want any inlining for cold callsites.
+    SampleThreshold = SampleHotCallSiteThreshold * NormalizedHotness * 100 +
+                      SampleColdCallSiteThreshold + 1;
+  }
 
   return (Candidate.SizeCost < SampleThreshold);
 }

diff  --git a/llvm/tools/llvm-profgen/CSPreInliner.h b/llvm/tools/llvm-profgen/CSPreInliner.h
index 9f63f7ef7bef..42771914ea3f 100644
--- a/llvm/tools/llvm-profgen/CSPreInliner.h
+++ b/llvm/tools/llvm-profgen/CSPreInliner.h
@@ -68,7 +68,7 @@ using ProfiledCandidateQueue =
 class CSPreInliner {
 public:
   CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,
-               uint64_t HotThreshold, uint64_t ColdThreshold);
+               ProfileSummary *Summary);
   void run();
 
 private:
@@ -82,11 +82,7 @@ class CSPreInliner {
   SampleContextTracker ContextTracker;
   SampleProfileMap &ProfileMap;
   ProfiledBinary &Binary;
-
-  // Count thresholds to answer isHotCount and isColdCount queries.
-  // Mirrors the threshold in ProfileSummaryInfo.
-  uint64_t HotCountThreshold;
-  uint64_t ColdCountThreshold;
+  ProfileSummary *Summary;
 };
 
 } // end namespace sampleprof

diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 61f9f79c67b2..be5a581b63e0 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -925,8 +925,7 @@ void CSProfileGenerator::postProcessProfiles() {
   // Run global pre-inliner to adjust/merge context profile based on estimated
   // inline decisions.
   if (EnableCSPreInliner) {
-    CSPreInliner(ProfileMap, *Binary, HotCountThreshold, ColdCountThreshold)
-        .run();
+    CSPreInliner(ProfileMap, *Binary, Summary.get()).run();
     // Turn off the profile merger by default unless it is explicitly enabled.
     if (!CSProfMergeColdContext.getNumOccurrences())
       CSProfMergeColdContext = false;
@@ -956,7 +955,7 @@ void CSProfileGenerator::postProcessProfiles() {
 
 void ProfileGeneratorBase::computeSummaryAndThreshold() {
   SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);
-  auto Summary = Builder.computeSummaryForProfiles(ProfileMap);
+  Summary = Builder.computeSummaryForProfiles(ProfileMap);
   HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(
       (Summary->getDetailedSummary()));
   ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(

diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.h b/llvm/tools/llvm-profgen/ProfileGenerator.h
index 15b4297b901e..410b08fb2e04 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.h
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.h
@@ -122,6 +122,8 @@ class ProfileGeneratorBase {
 
   ProfiledBinary *Binary = nullptr;
 
+  std::unique_ptr<ProfileSummary> Summary;
+
   // Used by SampleProfileWriter
   SampleProfileMap ProfileMap;