[llvm] a8a38ef - [llvm-profgen] Fix bug of loop scope mismatch

via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 5 16:53:23 PDT 2021


Author: wlei
Date: 2021-08-05T16:52:57-07:00
New Revision: a8a38ef3d99ce2b180f9c5ff968e5b930a99b10b

URL: https://github.com/llvm/llvm-project/commit/a8a38ef3d99ce2b180f9c5ff968e5b930a99b10b
DIFF: https://github.com/llvm/llvm-project/commit/a8a38ef3d99ce2b180f9c5ff968e5b930a99b10b.diff

LOG: [llvm-profgen] Fix bug of loop scope mismatch

One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D107529

Added: 
    

Modified: 
    llvm/tools/llvm-profgen/ProfileGenerator.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 57853f2373977..83d9f3c216f40 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -8,6 +8,7 @@
 
 #include "ProfileGenerator.h"
 #include "llvm/ProfileData/ProfileCommon.h"
+#include <unordered_set>
 
 static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
                                            cl::Required,
@@ -520,7 +521,8 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
   // Extract the top frame probes by looking up each address among the range in
   // the Address2ProbeMap
   extractProbesFromRange(RangeCounter, ProbeCounter, Binary);
-  std::unordered_map<MCDecodedPseudoProbeInlineTree *, FunctionSamples *>
+  std::unordered_map<MCDecodedPseudoProbeInlineTree *,
+                     std::unordered_set<FunctionSamples *>>
       FrameSamples;
   for (auto PI : ProbeCounter) {
     const MCDecodedPseudoProbe *Probe = PI.first;
@@ -530,7 +532,7 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
     // Record the current frame and FunctionProfile whenever samples are
     // collected for non-danglie probes. This is for reporting all of the
     // zero count probes of the frame later.
-    FrameSamples[Probe->getInlineTreeNode()] = &FunctionProfile;
+    FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);
     FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
     FunctionProfile.addTotalSamples(Count);
     if (Probe->isEntry()) {
@@ -559,12 +561,13 @@ void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
             FunctionProfile.getContext().getNameWithoutContext(), Count);
       }
     }
+  }
 
-    // Assign zero count for remaining probes without sample hits to
-    // 
diff erentiate from probes optimized away, of which the counts are unknown
-    // and will be inferred by the compiler.
-    for (auto &I : FrameSamples) {
-      auto *FunctionProfile = I.second;
+  // Assign zero count for remaining probes without sample hits to
+  // 
diff erentiate from probes optimized away, of which the counts are unknown
+  // and will be inferred by the compiler.
+  for (auto &I : FrameSamples) {
+    for (auto *FunctionProfile : I.second) {
       for (auto *Probe : I.first->getProbes()) {
         FunctionProfile->addBodySamplesForProbe(Probe->getIndex(), 0);
       }


        


More information about the llvm-commits mailing list