[llvm] 1a71908 - [CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples.

Hongtao Yu via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 21 18:08:11 PDT 2021


Author: Hongtao Yu
Date: 2021-04-21T18:07:58-07:00
New Revision: 1a719089a81b418b480e8b08d2d971fb087860db

URL: https://github.com/llvm/llvm-project/commit/1a719089a81b418b480e8b08d2d971fb087860db
DIFF: https://github.com/llvm/llvm-project/commit/1a719089a81b418b480e8b08d2d971fb087860db.diff

LOG: [CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples.

Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.

This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead.

The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier.

Good results seen for SPEC2017 in general regarding profile quality.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D100235

Added: 
    llvm/test/tools/llvm-profgen/Inputs/inline-cs-dangling-pseudoprobe.perfscript
    llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test

Modified: 
    llvm/include/llvm/MC/MCPseudoProbe.h
    llvm/test/tools/llvm-profgen/merge-cold-profile.test
    llvm/tools/llvm-profgen/ProfileGenerator.cpp
    llvm/tools/llvm-profgen/PseudoProbe.cpp
    llvm/tools/llvm-profgen/PseudoProbe.h

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/MC/MCPseudoProbe.h b/llvm/include/llvm/MC/MCPseudoProbe.h
index 15a8860d8d82f..27999e3449282 100644
--- a/llvm/include/llvm/MC/MCPseudoProbe.h
+++ b/llvm/include/llvm/MC/MCPseudoProbe.h
@@ -36,7 +36,6 @@
 //        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
 //        callees.  Each record contains:
 //          INLINE SITE
-//            GUID of the inlinee (uint64)
 //            ID of the callsite probe (ULEB128)
 //          FUNCTION BODY
 //            A FUNCTION BODY entry describing the inlined function.

diff  --git a/llvm/test/tools/llvm-profgen/Inputs/inline-cs-dangling-pseudoprobe.perfscript b/llvm/test/tools/llvm-profgen/Inputs/inline-cs-dangling-pseudoprobe.perfscript
new file mode 100644
index 0000000000000..ba1affd5fe64e
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/inline-cs-dangling-pseudoprobe.perfscript
@@ -0,0 +1,5 @@
+PERF_RECORD_MMAP2 595196/595196: [0x201000(0x1000) @ 0 00:1d 224227621 1042948]: r-xp /home/inline-cs-pseudoprobe.perfbin
+
+	          20180e
+	5541f689495641d7
+ 0x201858/0x20180e/P/-/-/0  0x20182b/0x20184d/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0  0x20182b/0x201800/P/-/-/0

diff  --git a/llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test b/llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test
new file mode 100644
index 0000000000000..5fe052ba30076
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/inline-cs-dangling-pseudoprobe.test
@@ -0,0 +1,51 @@
+; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-dangling-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output --csprof-cold-thres=0 | FileCheck %s --check-prefix=CHECK-UNWINDER
+; RUN: FileCheck %s --input-file %t
+
+; CHECK:     [main:2 @ foo]:58:0
+; CHECK-NEXT: 2: 15
+; CHECK-NEXT: 3: 14
+; CHECK-NEXT: 5: 14
+; CHECK-NEXT: 6: 15
+; CHECK-NEXT: !CFGChecksum: 138950591924
+; CHECK:[main:2 @ foo:8 @ bar]:1:0
+; CHECK-NEXT: 2: 18446744073709551615
+; CHECK-NEXT: 3: 18446744073709551615
+; CHECK-NEXT: 4: 1
+; CHECK-NEXT: !CFGChecksum: 72617220756
+
+
+; CHECK-UNWINDER:      Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:
+; CHECK-UNWINDER-EMPTY:
+; CHECK-UNWINDER-NEXT:   (800, 82b): 14
+; CHECK-UNWINDER-NEXT:   (84d, 858): 1
+
+; CHECK-UNWINDER:      Binary(inline-cs-pseudoprobe.perfbin)'s Branch Counter:
+; CHECK-UNWINDER-EMPTY:
+; CHECK-UNWINDER-NEXT:   (82b, 800): 14
+; CHECK-UNWINDER-NEXT:   (82b, 84d): 1
+; CHECK-UNWINDER-NEXT:   (858, 80e): 1
+
+; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
+; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
+; -g test.c  -o a.out
+
+#include <stdio.h>
+
+int bar(int x, int y) {
+  if (x % 3) {
+    return x - y;
+  }
+  return x + y;
+}
+
+void foo() {
+  int s, i = 0;
+  while (i++ < 4000 * 4000)
+    if (i % 91) s = bar(i, s); else s += 30;
+  printf("sum is %d\n", s);
+}
+
+int main() {
+  foo();
+  return 0;
+}

diff  --git a/llvm/test/tools/llvm-profgen/merge-cold-profile.test b/llvm/test/tools/llvm-profgen/merge-cold-profile.test
index 0549befd34e9d..4e0df4f749c99 100644
--- a/llvm/test/tools/llvm-profgen/merge-cold-profile.test
+++ b/llvm/test/tools/llvm-profgen/merge-cold-profile.test
@@ -1,17 +1,18 @@
 ; Used the data from recursion-compression.test, refer it for the unmerged output
-; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t --compress-recursion=-1 --csprof-cold-thres=8
-; RUN: FileCheck %s --input-file %t
+; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t1 --compress-recursion=-1 --csprof-cold-thres=8
+; RUN: FileCheck %s --input-file %t1
 
 ; Test --csprof-trim-cold-context=0
-; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t --compress-recursion=-1 --csprof-cold-thres=100 --csprof-trim-cold-context=0
-; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-KEEP-COLD
+; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t2 --compress-recursion=-1 --csprof-cold-thres=100 --csprof-trim-cold-context=0
+; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-KEEP-COLD
 
 ; Test --csprof-merge-cold-context=0
-; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t --compress-recursion=-1 --csprof-cold-thres=10 --csprof-merge-cold-context=0
-; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNMERGED
+; RUN: llvm-profgen --perfscript=%S/Inputs/recursion-compression-pseudoprobe.perfscript --binary=%S/Inputs/recursion-compression-pseudoprobe.perfbin --output=%t3 --compress-recursion=-1 --csprof-cold-thres=10 --csprof-merge-cold-context=0
+; RUN: FileCheck %s --input-file %t3 --check-prefix=CHECK-UNMERGED
 
 ; CHECK:     [fa]:14:4
 ; CHECK-NEXT: 1: 4
+; CHECK-NEXT: 2: 18446744073709551615
 ; CHECK-NEXT: 3: 4
 ; CHECK-NEXT: 4: 2
 ; CHECK-NEXT: 5: 1
@@ -37,6 +38,7 @@
 ; CHECK-KEEP-COLD-NEXT: !Attributes: 0
 ; CHECK-KEEP-COLD-NEXT:[fa]:14:4
 ; CHECK-KEEP-COLD-NEXT: 1: 4
+; CHECK-KEEP-COLD-NEXT: 2: 18446744073709551615
 ; CHECK-KEEP-COLD-NEXT: 3: 4
 ; CHECK-KEEP-COLD-NEXT: 4: 2
 ; CHECK-KEEP-COLD-NEXT: 5: 1

diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 9e1ba9bca1b8b..0cfcc84196813 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -510,22 +510,19 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
   // Extract the top frame probes by looking up each address among the range in
   // the Address2ProbeMap
   extractProbesFromRange(RangeCounter, ProbeCounter, Binary);
+  std::unordered_map<PseudoProbeInlineTree *, FunctionSamples *> FrameSamples;
   for (auto PI : ProbeCounter) {
     const PseudoProbe *Probe = PI.first;
     uint64_t Count = PI.second;
+    // Ignore dangling probes since they will be reported later if needed.
+    if (Probe->isDangling())
+      continue;
     FunctionSamples &FunctionProfile =
         getFunctionProfileForLeafProbe(ContextStrStack, Probe, Binary);
-
-    // Use InvalidProbeCount(UINT64_MAX) to mark sample count for a dangling
-    // probe. Dangling probes are the probes associated to an empty block. With
-    // this place holder, sample count on dangling probe will not be trusted by
-    // the compiler and it will rely on the counts inference algorithm to get
-    // the probe a reasonable count.
-    if (Probe->isDangling()) {
-      FunctionProfile.addBodySamplesForProbe(
-          Probe->Index, FunctionSamples::InvalidProbeCount);
-      continue;
-    }
+    // Record the current frame and FunctionProfile whenever samples are
+    // collected for non-danglie probes. This is for reporting all of the
+    // dangling probes of the frame later.
+    FrameSamples[Probe->getInlineTreeNode()] = &FunctionProfile;
     FunctionProfile.addBodySamplesForProbe(Probe->Index, Count);
     FunctionProfile.addTotalSamples(Count);
     if (Probe->isEntry()) {
@@ -554,6 +551,22 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
             FunctionProfile.getContext().getNameWithoutContext(), Count);
       }
     }
+
+    // Report dangling probes for frames that have real samples collected.
+    // Dangling probes are the probes associated to an empty block. With this
+    // place holder, sample count on a dangling probe will not be trusted by the
+    // compiler and we will rely on the counts inference algorithm to get the
+    // probe a reasonable count. Use InvalidProbeCount to mark sample count for
+    // a dangling probe.
+    for (auto &I : FrameSamples) {
+      auto *FunctionProfile = I.second;
+      for (auto *Probe : I.first->getProbes()) {
+        if (Probe->isDangling()) {
+          FunctionProfile->addBodySamplesForProbe(
+              Probe->Index, FunctionSamples::InvalidProbeCount);
+        }
+      }
+    }
   }
 }
 

diff  --git a/llvm/tools/llvm-profgen/PseudoProbe.cpp b/llvm/tools/llvm-profgen/PseudoProbe.cpp
index a9040c9c5aa0c..b94966129a15d 100644
--- a/llvm/tools/llvm-profgen/PseudoProbe.cpp
+++ b/llvm/tools/llvm-profgen/PseudoProbe.cpp
@@ -198,7 +198,6 @@ void PseudoProbeDecoder::buildAddress2ProbeMap(const uint8_t *Start,
   //         A list of NUM_INLINED_FUNCTIONS entries describing each of the
   //         inlined callees.  Each record contains:
   //           INLINE SITE
-  //             GUID of the inlinee (uint64)
   //             Index of the callsite probe (ULEB128)
   //           FUNCTION BODY
   //             A FUNCTION BODY entry describing the inlined function.
@@ -214,8 +213,11 @@ void PseudoProbeDecoder::buildAddress2ProbeMap(const uint8_t *Start,
   uint32_t Index = 0;
   // A DFS-based decoding
   while (Data < End) {
-    // Read inline site for inlinees
-    if (Root != Cur) {
+    if (Root == Cur) {
+      // Use a sequential id for top level inliner.
+      Index = Root->getChildren().size();
+    } else {
+      // Read inline site for inlinees
       Index = readUnsignedNumber<uint32_t>();
     }
     // Switch/add to a new tree node(inlinee)
@@ -243,10 +245,10 @@ void PseudoProbeDecoder::buildAddress2ProbeMap(const uint8_t *Start,
         Addr = readUnencodedNumber<int64_t>();
       }
       // Populate Address2ProbesMap
-      std::vector<PseudoProbe> &ProbeVec = Address2ProbesMap[Addr];
-      ProbeVec.emplace_back(Addr, Cur->GUID, Index, PseudoProbeType(Kind), Attr,
-                            Cur);
-      Cur->addProbes(&ProbeVec.back());
+      auto &Probes = Address2ProbesMap[Addr];
+      Probes.emplace_back(Addr, Cur->GUID, Index, PseudoProbeType(Kind), Attr,
+                          Cur);
+      Cur->addProbes(&Probes.back());
       LastAddr = Addr;
     }
 
@@ -298,7 +300,7 @@ PseudoProbeDecoder::getCallProbeForAddr(uint64_t Address) const {
   auto It = Address2ProbesMap.find(Address);
   if (It == Address2ProbesMap.end())
     return nullptr;
-  const std::vector<PseudoProbe> &Probes = It->second;
+  const auto &Probes = It->second;
 
   const PseudoProbe *CallProbe = nullptr;
   for (const auto &Probe : Probes) {

diff  --git a/llvm/tools/llvm-profgen/PseudoProbe.h b/llvm/tools/llvm-profgen/PseudoProbe.h
index 207772453c974..4c75868bdfafa 100644
--- a/llvm/tools/llvm-profgen/PseudoProbe.h
+++ b/llvm/tools/llvm-profgen/PseudoProbe.h
@@ -45,9 +45,10 @@ class PseudoProbeInlineTree {
       return std::get<0>(Site) ^ std::get<1>(Site);
     }
   };
-  std::unordered_map<InlineSite, std::unique_ptr<PseudoProbeInlineTree>,
-                     InlineSiteHash>
-      Children;
+  using InlinedProbeTreeMap =
+      std::unordered_map<InlineSite, std::unique_ptr<PseudoProbeInlineTree>,
+                         InlineSiteHash>;
+  InlinedProbeTreeMap Children;
 
 public:
   // Inlinee function GUID
@@ -71,6 +72,8 @@ class PseudoProbeInlineTree {
     return Ret.first->second.get();
   }
 
+  InlinedProbeTreeMap &getChildren() { return Children; }
+  std::vector<PseudoProbe *> &getProbes() { return ProbeVector; }
   void addProbes(PseudoProbe *Probe) { ProbeVector.push_back(Probe); }
   // Return false if it's a dummy inline site
   bool hasInlineSite() const { return std::get<0>(ISite) != 0; }
@@ -91,7 +94,7 @@ struct PseudoProbeFuncDesc {
 // GUID to PseudoProbeFuncDesc map
 using GUIDProbeFunctionMap = std::unordered_map<uint64_t, PseudoProbeFuncDesc>;
 // Address to pseudo probes map.
-using AddressProbesMap = std::unordered_map<uint64_t, std::vector<PseudoProbe>>;
+using AddressProbesMap = std::unordered_map<uint64_t, std::list<PseudoProbe>>;
 
 /*
 A pseudo probe has the format like below:
@@ -135,6 +138,8 @@ struct PseudoProbe {
   bool isDirectCall() const { return Type == PseudoProbeType::DirectCall; }
   bool isCall() const { return isIndirectCall() || isDirectCall(); }
 
+  PseudoProbeInlineTree *getInlineTreeNode() const { return InlineTree; }
+
   // Get the inlined context by traversing current inline tree backwards,
   // each tree node has its InlineSite which is taken as the context.
   // \p ContextStack is populated in root to leaf order


        


More information about the llvm-commits mailing list