[llvm] 856a6a5 - [CSSPGO][llvm-profgen] Trim and merge context beforehand to reduce memory usage

via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 11 16:02:55 PDT 2021


Author: wlei
Date: 2021-08-11T16:02:35-07:00
New Revision: 856a6a504165d6f46e9b29b463c19776db034794

URL: https://github.com/llvm/llvm-project/commit/856a6a504165d6f46e9b29b463c19776db034794
DIFF: https://github.com/llvm/llvm-project/commit/856a6a504165d6f46e9b29b463c19776db034794.diff

LOG: [CSSPGO][llvm-profgen] Trim and merge context beforehand to reduce memory usage

Currently we use a centralized string map(StringMap<FunctionSamples> ProfileMap) to store the profile while populating the sample, which might cause the memory usage bottleneck. I saw in an extreme case, there are thousands of samples whose context stack depth is >= 100. The memory consumption can be greater than 100GB.

As here the context is used for inlining, we can assume we won't have so many of inlinees keeping inlined at the same root function, so this change tried to cap the context stack and merge the samples for peak memory reduction and this is done after recursion compression.

The default value is -1 meaning no depth limit, in the future we can tune to a smaller one.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D107800

Added: 
    

Modified: 
    llvm/test/tools/llvm-profgen/merge-cold-profile.test
    llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
    llvm/tools/llvm-profgen/PerfReader.cpp
    llvm/tools/llvm-profgen/ProfileGenerator.cpp
    llvm/tools/llvm-profgen/ProfileGenerator.h
    llvm/tools/llvm-profgen/ProfiledBinary.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/test/tools/llvm-profgen/merge-cold-profile.test b/llvm/test/tools/llvm-profgen/merge-cold-profile.test
index 018cd15973e81..538d5568ba0f2 100644
--- a/llvm/test/tools/llvm-profgen/merge-cold-profile.test
+++ b/llvm/test/tools/llvm-profgen/merge-cold-profile.test
@@ -11,7 +11,7 @@
 ; RUN: FileCheck %s --input-file %t3 --check-prefix=CHECK-UNMERGED
 
 ; Test --csprof-frame-depth-for-cold-context
-; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t2 --compress-recursion=-1 --profile-summary-cold-count=100 --csprof-trim-cold-context=0 --csprof-frame-depth-for-cold-context=2
+; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t2 --compress-recursion=-1 --profile-summary-cold-count=100 --csprof-trim-cold-context=0 --csprof-max-cold-context-depth=2
 ; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-COLD-CONTEXT-LENGTH
 
 ; CHECK:     [fa]:14:4

diff  --git a/llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test b/llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
index ac8eeabefc411..897d03bf88538 100644
--- a/llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
+++ b/llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
@@ -3,6 +3,8 @@
 ; RUN: FileCheck %s --input-file %t -check-prefix=CHECK-UNCOMPRESS
 ; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-noprobe.perfscript --binary=%S/Inputs/recursion-compression-noprobe.perfbin --output=%t --profile-summary-cold-count=0
 ; RUN: FileCheck %s --input-file %t
+; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-noprobe.perfscript --binary=%S/Inputs/recursion-compression-noprobe.perfbin --output=%t --compress-recursion=0 --profile-summary-cold-count=0 --csprof-max-context-depth=2
+; RUN: FileCheck %s --input-file %t -check-prefix=CHECK-MAX-CTX-DEPTH
 
 ; CHECK-UNCOMPRESS:[main:1 @ foo:3 @ fa:2 @ fb]:48:0
 ; CHECK-UNCOMPRESS: 1: 11
@@ -21,6 +23,20 @@
 ; CHECK-UNCOMPRESS:[main:1 @ foo:3 @ fa:2 @ fb:2 @ fa:2 @ fb]:2:0
 ; CHECK-UNCOMPRESS: 2: 1 fa:1
 
+; CHECK-MAX-CTX-DEPTH:[foo:3 @ fa:2 @ fb]:47:0
+; CHECK-MAX-CTX-DEPTH: 1: 11
+; CHECK-MAX-CTX-DEPTH:[main:1 @ foo:3 @ fa]:13:0
+; CHECK-MAX-CTX-DEPTH: 1: 1
+; CHECK-MAX-CTX-DEPTH: 2: 2
+; CHECK-MAX-CTX-DEPTH:[fa:2 @ fb:2 @ fa]:8:0
+; CHECK-MAX-CTX-DEPTH: 1: 1
+; CHECK-MAX-CTX-DEPTH: 2: 1
+; CHECK-MAX-CTX-DEPTH: 4: 1
+; CHECK-MAX-CTX-DEPTH:[main:1 @ foo]:7:0
+; CHECK-MAX-CTX-DEPTH: 2: 1
+; CHECK-MAX-CTX-DEPTH: 3: 2 fa:1
+; CHECK-MAX-CTX-DEPTH:[fb:2 @ fa:2 @ fb]:1:0
+
 
 ; CHECK: [main:1 @ foo:3 @ fa:2 @ fb]:48:0
 ; CHECK:  1: 11

diff  --git a/llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test b/llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
index 0c3b5c4d705a2..c7092892e698e 100644
--- a/llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
+++ b/llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
@@ -5,7 +5,8 @@
 ; RUN: FileCheck %s --input-file %t
 ; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe-nommap.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t --show-unwinder-output --profile-summary-cold-count=0 | FileCheck %s --check-prefix=CHECK-UNWINDER
 ; RUN: FileCheck %s --input-file %t
-
+; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t --compress-recursion=0 --profile-summary-cold-count=0 --csprof-max-context-depth=0
+; RUN: FileCheck %s --input-file %t -check-prefix=CHECK-MAX-CTX-DEPTH
 
 ; CHECK-UNCOMPRESS: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:5 @ fb:5 @ fb:5 @ fb:5 @ fb:5 @ fb:5 @ fb:5 @ fb:6 @ fa:8 @ fa:7 @ fb:6 @ fa]:4:1
 ; CHECK-UNCOMPRESS:  1: 1
@@ -64,6 +65,25 @@
 ; CHECK-UNCOMPRESS:  !CFGChecksum: 563022570642068
 
 
+; CHECK-MAX-CTX-DEPTH: [fb]:19:6
+; CHECK-MAX-CTX-DEPTH:  1: 6
+; CHECK-MAX-CTX-DEPTH:  2: 3
+; CHECK-MAX-CTX-DEPTH:  3: 3
+; CHECK-MAX-CTX-DEPTH:  4: 0
+; CHECK-MAX-CTX-DEPTH:  5: 4 fb:4
+; CHECK-MAX-CTX-DEPTH:  6: 3 fa:3
+; CHECK-MAX-CTX-DEPTH:  !CFGChecksum: 563022570642068
+; CHECK-MAX-CTX-DEPTH: [fa]:14:4
+; CHECK-MAX-CTX-DEPTH:  1: 4
+; CHECK-MAX-CTX-DEPTH:  3: 4
+; CHECK-MAX-CTX-DEPTH:  4: 2
+; CHECK-MAX-CTX-DEPTH:  5: 1
+; CHECK-MAX-CTX-DEPTH:  6: 0
+; CHECK-MAX-CTX-DEPTH:  7: 2 fb:2
+; CHECK-MAX-CTX-DEPTH:  8: 1 fa:1
+; CHECK-MAX-CTX-DEPTH:  !CFGChecksum: 563070469352221
+
+
 ; CHECK: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb]:13:4
 ; CHECK:  1: 4
 ; CHECK:  2: 3

diff  --git a/llvm/tools/llvm-profgen/PerfReader.cpp b/llvm/tools/llvm-profgen/PerfReader.cpp
index d855a74fc8f74..4cff3a9e6bf71 100644
--- a/llvm/tools/llvm-profgen/PerfReader.cpp
+++ b/llvm/tools/llvm-profgen/PerfReader.cpp
@@ -109,6 +109,9 @@ std::shared_ptr<ProbeBasedCtxKey> ProbeStack::getContextKey() {
   }
   CSProfileGenerator::compressRecursionContext<const MCDecodedPseudoProbe *>(
       ProbeBasedKey->Probes);
+  CSProfileGenerator::trimContext<const MCDecodedPseudoProbe *>(
+      ProbeBasedKey->Probes);
+
   ProbeBasedKey->genHashCode();
   return ProbeBasedKey;
 }

diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 83d9f3c216f40..071c4e0934c5e 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -44,11 +44,17 @@ static cl::opt<bool> CSProfTrimColdContext(
     cl::desc("If the total count of the profile after all merge is done "
              "is still smaller than threshold, it will be trimmed."));
 
-static cl::opt<uint32_t> CSProfColdContextFrameDepth(
-    "csprof-frame-depth-for-cold-context", cl::init(1), cl::ZeroOrMore,
-    cl::desc("Keep the last K frames while merging cold profile. 1 means the "
+static cl::opt<uint32_t> CSProfMaxColdContextDepth(
+    "csprof-max-cold-context-depth", cl::init(1), cl::ZeroOrMore,
+    cl::desc("Keep the last K contexts while merging cold profile. 1 means the "
              "context-less base profile"));
 
+static cl::opt<int, true> CSProfMaxContextDepth(
+    "csprof-max-context-depth", cl::ZeroOrMore,
+    cl::desc("Keep the last K contexts while merging profile. -1 means no "
+             "depth limit."),
+    cl::location(llvm::sampleprof::CSProfileGenerator::MaxContextDepth));
+
 static cl::opt<bool> EnableCSPreInliner(
     "csspgo-preinliner", cl::Hidden, cl::init(false),
     cl::desc("Run a global pre-inliner to merge context profile based on "
@@ -65,6 +71,8 @@ namespace sampleprof {
 // Initialize the MaxCompressionSize to -1 which means no size limit
 int32_t CSProfileGenerator::MaxCompressionSize = -1;
 
+int CSProfileGenerator::MaxContextDepth = -1;
+
 static bool
 usePseudoProbes(const BinarySampleCounterMap &BinarySampleCounters) {
   return BinarySampleCounters.size() &&
@@ -415,7 +423,7 @@ void CSProfileGenerator::postProcessProfiles() {
   SampleContextTrimmer(ProfileMap)
       .trimAndMergeColdContextProfiles(
           ColdCountThreshold, CSProfTrimColdContext, CSProfMergeColdContext,
-          CSProfColdContextFrameDepth);
+          CSProfMaxColdContextDepth);
 }
 
 void CSProfileGenerator::computeSummaryAndThreshold() {
@@ -608,6 +616,7 @@ FunctionSamples &PseudoProbeCSProfileGenerator::getFunctionProfileForLeafProbe(
   std::string LeafFrame = ContextStrStack.back();
   ContextStrStack.pop_back();
   CSProfileGenerator::compressRecursionContext(ContextStrStack);
+  CSProfileGenerator::trimContext(ContextStrStack);
 
   std::ostringstream OContextStr;
   for (uint32_t I = 0; I < ContextStrStack.size(); I++) {

diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.h b/llvm/tools/llvm-profgen/ProfileGenerator.h
index cc1959cdcd931..dae6c3a3ae8f0 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.h
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.h
@@ -70,6 +70,16 @@ class CSProfileGenerator : public ProfileGenerator {
 public:
   void generateProfile() override;
 
+  // Trim the context stack at a given depth.
+  template <typename T>
+  static void trimContext(SmallVectorImpl<T> &S, int Depth = MaxContextDepth) {
+    if (Depth < 0 || static_cast<size_t>(Depth) >= S.size())
+      return;
+    std::copy(S.begin() + S.size() - static_cast<size_t>(Depth), S.end(),
+              S.begin());
+    S.resize(Depth);
+  }
+
   // Remove adjacent repeated context sequences up to a given sequence length,
   // -1 means no size limit. Note that repeated sequences are identified based
   // on the exact call site, this is finer granularity than function recursion.
@@ -212,6 +222,7 @@ class CSProfileGenerator : public ProfileGenerator {
   // Deduplicate adjacent repeated context sequences up to a given sequence
   // length. -1 means no size limit.
   static int32_t MaxCompressionSize;
+  static int MaxContextDepth;
 };
 
 using ProbeCounterMap =

diff  --git a/llvm/tools/llvm-profgen/ProfiledBinary.cpp b/llvm/tools/llvm-profgen/ProfiledBinary.cpp
index 8b32775ac7ed4..9db7d6cf6bfcb 100644
--- a/llvm/tools/llvm-profgen/ProfiledBinary.cpp
+++ b/llvm/tools/llvm-profgen/ProfiledBinary.cpp
@@ -125,6 +125,7 @@ ProfiledBinary::getExpandedContextStr(const SmallVectorImpl<uint64_t> &Stack,
   std::string LeafFrame = ContextVec.back();
   ContextVec.pop_back();
   CSProfileGenerator::compressRecursionContext<std::string>(ContextVec);
+  CSProfileGenerator::trimContext<std::string>(ContextVec);
 
   std::ostringstream OContextStr;
   for (uint32_t I = 0; I < (uint32_t)ContextVec.size(); I++) {


        


More information about the llvm-commits mailing list