[llvm] [BOLT] Flow conservation scores (PR #127954)

via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 20 10:22:18 PST 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-bolt

Author: ShatianWang (ShatianWang)

<details>
<summary>Changes</summary>

Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #<!-- -->109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are.

CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow"). 

CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow"). 

Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump functions / basic blocks with bad flow conservations. 

---

Patch is 46.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127954.diff


8 Files Affected:

- (removed) bolt/include/bolt/Passes/ContinuityStats.h (-61) 
- (added) bolt/include/bolt/Passes/ProfileQualityStats.h (+98) 
- (modified) bolt/lib/Passes/CMakeLists.txt (+1-1) 
- (removed) bolt/lib/Passes/ContinuityStats.cpp (-250) 
- (added) bolt/lib/Passes/ProfileQualityStats.cpp (+579) 
- (modified) bolt/lib/Rewrite/BinaryPassManager.cpp (+23-23) 
- (removed) bolt/test/X86/cfg-discontinuity-reporting.test (-4) 
- (added) bolt/test/X86/profile-quality-reporting.test (+6) 


``````````diff
diff --git a/bolt/include/bolt/Passes/ContinuityStats.h b/bolt/include/bolt/Passes/ContinuityStats.h
deleted file mode 100644
index bd4d491ad4a55..0000000000000
--- a/bolt/include/bolt/Passes/ContinuityStats.h
+++ /dev/null
@@ -1,61 +0,0 @@
-//===- bolt/Passes/ContinuityStats.h ----------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This pass checks how well the BOLT input profile satisfies the following
-// "CFG continuity" property of a perfect profile:
-//
-//        Each positive-execution-count block in the function’s CFG
-//        should be *reachable* from a positive-execution-count function
-//        entry block through a positive-execution-count path.
-//
-// More specifically, for each of the hottest 1000 functions, the pass
-// calculates the function’s fraction of basic block execution counts
-// that is *unreachable*. It then reports the 95th percentile of the
-// distribution of the 1000 unreachable fractions in a single BOLT-INFO line.
-// The smaller the reported value is, the better the BOLT profile
-// satisfies the CFG continuity property.
-
-// The default value of 1000 above can be changed via the hidden BOLT option
-// `-num-functions-for-continuity-check=[N]`.
-// If more detailed stats are needed, `-v=1` can be used: the hottest N
-// functions will be grouped into 5 equally-sized buckets, from the hottest
-// to the coldest; for each bucket, various summary statistics of the
-// distribution of the unreachable fractions and the raw unreachable execution
-// counts will be reported.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef BOLT_PASSES_CONTINUITYSTATS_H
-#define BOLT_PASSES_CONTINUITYSTATS_H
-
-#include "bolt/Passes/BinaryPasses.h"
-#include <vector>
-
-namespace llvm {
-
-class raw_ostream;
-
-namespace bolt {
-class BinaryContext;
-
-/// Compute and report to the user the function CFG continuity quality
-class PrintContinuityStats : public BinaryFunctionPass {
-public:
-  explicit PrintContinuityStats(const cl::opt<bool> &PrintPass)
-      : BinaryFunctionPass(PrintPass) {}
-
-  bool shouldOptimize(const BinaryFunction &BF) const override;
-  const char *getName() const override { return "continuity-stats"; }
-  bool shouldPrint(const BinaryFunction &) const override { return false; }
-  Error runOnFunctions(BinaryContext &BC) override;
-};
-
-} // namespace bolt
-} // namespace llvm
-
-#endif // BOLT_PASSES_CONTINUITYSTATS_H
diff --git a/bolt/include/bolt/Passes/ProfileQualityStats.h b/bolt/include/bolt/Passes/ProfileQualityStats.h
new file mode 100644
index 0000000000000..761fd33431a46
--- /dev/null
+++ b/bolt/include/bolt/Passes/ProfileQualityStats.h
@@ -0,0 +1,98 @@
+//===- bolt/Passes/ProfileQualityStats.h ------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass checks the BOLT input profile quality.
+//
+// Check 1: how well the input profile satisfies the following
+// "CFG continuity" property of a perfect profile:
+//
+//        Each positive-execution-count block in the function’s CFG
+//        is *reachable* from a positive-execution-count function
+//        entry block through a positive-execution-count path.
+//
+// More specifically, for each of the hottest 1000 functions, the pass
+// calculates the function’s fraction of basic block execution counts
+// that is *unreachable*. It then reports the 95th percentile of the
+// distribution of the 1000 unreachable fractions in a single BOLT-INFO line.
+// The smaller the reported value is, the better the BOLT profile
+// satisfies the CFG continuity property.
+//
+// Check 2: how well the input profile satisfies the "call graph flow
+// conservation" property of a perfect profile:
+//
+//        For each function that is not a program entry, the number of times the
+//        function is called is equal to the net CFG outflow of the
+//        function's entry block(s).
+//
+// More specifically, for each of the hottest 1000 functions, the pass obtains
+// A = number of times the function is called, B = the function's entry blocks'
+// inflow, C = the function's entry blocks' outflow, where B and C are computed
+// using the function's weighted CFG. It then computes gap = 1 - MIN(A,C-B) /
+// MAX(A, C-B). The pass reports the 95th percentile of the distribution of the
+// 1000 gaps in a single BOLT-INFO line. The smaller the reported value is, the
+// better the BOLT profile satisfies the call graph flow conservation property.
+//
+// Check 3: how well the input profile satisfies the "function CFG flow
+// conservation property" of a perfect profile:
+//
+//       A non-entry non-exit basic block's inflow is equal to its outflow.
+//
+// More specifically, for each of the hottest 1000 functions, the pass loops
+// over its basic blocks that are non-entry and non-exit, and for each block
+// obtains a block gap = 1 - MIN(block inflow, block outflow, block call count
+// if any) / MAX(block inflow, block outflow, block call count if any). It then
+// aggregates the block gaps into 2 values for the function: "weighted" is the
+// weighted average of the block conservation gaps, where the weights depend on
+// each block's execution count and instruction count; "worst" is the worst
+// (biggest) block gap acorss all basic blocks in the function with an execution
+// count of > 500. The pass then reports the 95th percentile of the weighted and
+// worst values of the 1000 functions in a single BOLT-INFO line. The smaller
+// the reported values are, the better the BOLT profile satisfies the function
+// CFG flow conservation property.
+//
+// The default value of 1000 above can be changed via the hidden BOLT option
+// `-num-functions-for-profile-quality-check=[N]`.
+// The default reporting of the 95th percentile can be changed via the hidden
+// BOLT option `-percentile-for-profile-quality-check=[M]`.
+//
+// If more detailed stats are needed, `-v=1` can be used: the hottest N
+// functions will be grouped into 5 equally-sized buckets, from the hottest
+// to the coldest; for each bucket, various summary statistics of the
+// profile quality will be reported.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_PASSES_PROFILEQUALITYSTATS_H
+#define BOLT_PASSES_PROFILEQUALITYSTATS_H
+
+#include "bolt/Passes/BinaryPasses.h"
+#include <vector>
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace bolt {
+class BinaryContext;
+
+/// Compute and report to the user the profile quality
+class PrintProfileQualityStats : public BinaryFunctionPass {
+public:
+  explicit PrintProfileQualityStats(const cl::opt<bool> &PrintPass)
+      : BinaryFunctionPass(PrintPass) {}
+
+  bool shouldOptimize(const BinaryFunction &BF) const override;
+  const char *getName() const override { return "profile-quality-stats"; }
+  bool shouldPrint(const BinaryFunction &) const override { return false; }
+  Error runOnFunctions(BinaryContext &BC) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif // BOLT_PASSES_PROFILEQUALITYSTATS_H
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 1e3289484a5ba..692c82f2a5336 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -26,7 +26,7 @@ add_llvm_library(LLVMBOLTPasses
   PatchEntries.cpp
   PettisAndHansen.cpp
   PLTCall.cpp
-  ContinuityStats.cpp
+  ProfileQualityStats.cpp
   RegAnalysis.cpp
   RegReAssign.cpp
   ReorderAlgorithm.cpp
diff --git a/bolt/lib/Passes/ContinuityStats.cpp b/bolt/lib/Passes/ContinuityStats.cpp
deleted file mode 100644
index b32365b59065d..0000000000000
--- a/bolt/lib/Passes/ContinuityStats.cpp
+++ /dev/null
@@ -1,250 +0,0 @@
-//===- bolt/Passes/ContinuityStats.cpp --------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file implements the continuity stats calculation pass.
-//
-//===----------------------------------------------------------------------===//
-
-#include "bolt/Passes/ContinuityStats.h"
-#include "bolt/Core/BinaryBasicBlock.h"
-#include "bolt/Core/BinaryFunction.h"
-#include "bolt/Utils/CommandLineOpts.h"
-#include "llvm/Support/CommandLine.h"
-#include <queue>
-#include <unordered_map>
-#include <unordered_set>
-
-#define DEBUG_TYPE "bolt-opts"
-
-using namespace llvm;
-using namespace bolt;
-
-namespace opts {
-extern cl::opt<unsigned> Verbosity;
-cl::opt<unsigned> NumFunctionsForContinuityCheck(
-    "num-functions-for-continuity-check",
-    cl::desc("number of hottest functions to print aggregated "
-             "CFG discontinuity stats of."),
-    cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
-} // namespace opts
-
-namespace {
-using FunctionListType = std::vector<const BinaryFunction *>;
-using function_iterator = FunctionListType::iterator;
-
-template <typename T>
-void printDistribution(raw_ostream &OS, std::vector<T> &values,
-                       bool Fraction = false) {
-  if (values.empty())
-    return;
-  // Sort values from largest to smallest and print the MAX, TOP 1%, 5%, 10%,
-  // 20%, 50%, 80%, MIN. If Fraction is true, then values are printed as
-  // fractions instead of integers.
-  std::sort(values.begin(), values.end());
-
-  auto printLine = [&](std::string Text, double Percent) {
-    int Rank = int(values.size() * (1.0 - Percent / 100));
-    if (Percent == 0)
-      Rank = values.size() - 1;
-    if (Fraction)
-      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
-         << format("%.2lf%%", values[Rank] * 100) << "\n";
-    else
-      OS << "  " << Text << std::string(9 - Text.length(), ' ') << ": "
-         << values[Rank] << "\n";
-  };
-
-  printLine("MAX", 0);
-  const int percentages[] = {1, 5, 10, 20, 50, 80};
-  for (size_t i = 0; i < sizeof(percentages) / sizeof(percentages[0]); ++i) {
-    printLine("TOP " + std::to_string(percentages[i]) + "%", percentages[i]);
-  }
-  printLine("MIN", 100);
-}
-
-void printCFGContinuityStats(raw_ostream &OS,
-                             iterator_range<function_iterator> &Functions) {
-  // Given a perfect profile, every positive-execution-count BB should be
-  // connected to an entry of the function through a positive-execution-count
-  // directed path in the control flow graph.
-  std::vector<size_t> NumUnreachables;
-  std::vector<size_t> SumECUnreachables;
-  std::vector<double> FractionECUnreachables;
-
-  for (auto it = Functions.begin(); it != Functions.end(); ++it) {
-    const BinaryFunction *Function = *it;
-    if (Function->size() <= 1)
-      continue;
-
-    // Compute the sum of all BB execution counts (ECs).
-    size_t NumPosECBBs = 0;
-    size_t SumAllBBEC = 0;
-    for (const BinaryBasicBlock &BB : *Function) {
-      const size_t BBEC = BB.getKnownExecutionCount();
-      NumPosECBBs += BBEC > 0 ? 1 : 0;
-      SumAllBBEC += BBEC;
-    }
-
-    // Perform BFS on subgraph of CFG induced by positive weight edges.
-    // Compute the number of BBs reachable from the entry(s) of the function and
-    // the sum of their execution counts (ECs).
-    std::unordered_map<unsigned, const BinaryBasicBlock *> IndexToBB;
-    std::unordered_set<unsigned> Visited;
-    std::queue<unsigned> Queue;
-    for (const BinaryBasicBlock &BB : *Function) {
-      // Make sure BB.getIndex() is not already in IndexToBB.
-      assert(IndexToBB.find(BB.getIndex()) == IndexToBB.end());
-      IndexToBB[BB.getIndex()] = &BB;
-      if (BB.isEntryPoint() && BB.getKnownExecutionCount() > 0) {
-        Queue.push(BB.getIndex());
-        Visited.insert(BB.getIndex());
-      }
-    }
-    while (!Queue.empty()) {
-      const unsigned BBIndex = Queue.front();
-      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
-      Queue.pop();
-      auto SuccBIIter = BB->branch_info_begin();
-      for (const BinaryBasicBlock *Succ : BB->successors()) {
-        const uint64_t Count = SuccBIIter->Count;
-        if (Count == BinaryBasicBlock::COUNT_NO_PROFILE || Count == 0) {
-          ++SuccBIIter;
-          continue;
-        }
-        if (!Visited.insert(Succ->getIndex()).second) {
-          ++SuccBIIter;
-          continue;
-        }
-        Queue.push(Succ->getIndex());
-        ++SuccBIIter;
-      }
-    }
-
-    const size_t NumReachableBBs = Visited.size();
-
-    // Loop through Visited, and sum the corresponding BBs' execution counts
-    // (ECs).
-    size_t SumReachableBBEC = 0;
-    for (const unsigned BBIndex : Visited) {
-      const BinaryBasicBlock *BB = IndexToBB[BBIndex];
-      SumReachableBBEC += BB->getKnownExecutionCount();
-    }
-
-    const size_t NumPosECBBsUnreachableFromEntry =
-        NumPosECBBs - NumReachableBBs;
-    const size_t SumUnreachableBBEC = SumAllBBEC - SumReachableBBEC;
-    const double FractionECUnreachable =
-        (double)SumUnreachableBBEC / SumAllBBEC;
-
-    if (opts::Verbosity >= 2 && FractionECUnreachable >= 0.05) {
-      OS << "Non-trivial CFG discontinuity observed in function "
-         << Function->getPrintName() << "\n";
-      LLVM_DEBUG(Function->dump());
-    }
-
-    NumUnreachables.push_back(NumPosECBBsUnreachableFromEntry);
-    SumECUnreachables.push_back(SumUnreachableBBEC);
-    FractionECUnreachables.push_back(FractionECUnreachable);
-  }
-
-  if (FractionECUnreachables.empty())
-    return;
-
-  std::sort(FractionECUnreachables.begin(), FractionECUnreachables.end());
-  const int Rank = int(FractionECUnreachables.size() * 0.95);
-  OS << format("top 5%% function CFG discontinuity is %.2lf%%\n",
-               FractionECUnreachables[Rank] * 100);
-
-  if (opts::Verbosity >= 1) {
-    OS << "abbreviations: EC = execution count, POS BBs = positive EC BBs\n"
-       << "distribution of NUM(unreachable POS BBs) among all focal "
-          "functions\n";
-    printDistribution(OS, NumUnreachables);
-
-    OS << "distribution of SUM_EC(unreachable POS BBs) among all focal "
-          "functions\n";
-    printDistribution(OS, SumECUnreachables);
-
-    OS << "distribution of [(SUM_EC(unreachable POS BBs) / SUM_EC(all "
-          "POS BBs))] among all focal functions\n";
-    printDistribution(OS, FractionECUnreachables, /*Fraction=*/true);
-  }
-}
-
-void printAll(BinaryContext &BC, FunctionListType &ValidFunctions,
-              size_t NumTopFunctions) {
-  // Sort the list of functions by execution counts (reverse).
-  llvm::sort(ValidFunctions,
-             [&](const BinaryFunction *A, const BinaryFunction *B) {
-               return A->getKnownExecutionCount() > B->getKnownExecutionCount();
-             });
-
-  const size_t RealNumTopFunctions =
-      std::min(NumTopFunctions, ValidFunctions.size());
-
-  iterator_range<function_iterator> Functions(
-      ValidFunctions.begin(), ValidFunctions.begin() + RealNumTopFunctions);
-
-  BC.outs() << format("BOLT-INFO: among the hottest %zu functions ",
-                      RealNumTopFunctions);
-  printCFGContinuityStats(BC.outs(), Functions);
-
-  // Print more detailed bucketed stats if requested.
-  if (opts::Verbosity >= 1 && RealNumTopFunctions >= 5) {
-    const size_t PerBucketSize = RealNumTopFunctions / 5;
-    BC.outs() << format(
-        "Detailed stats for 5 buckets, each with  %zu functions:\n",
-        PerBucketSize);
-
-    // For each bucket, print the CFG continuity stats of the functions in the
-    // bucket.
-    for (size_t BucketIndex = 0; BucketIndex < 5; ++BucketIndex) {
-      const size_t StartIndex = BucketIndex * PerBucketSize;
-      const size_t EndIndex = StartIndex + PerBucketSize;
-      iterator_range<function_iterator> Functions(
-          ValidFunctions.begin() + StartIndex,
-          ValidFunctions.begin() + EndIndex);
-      const size_t MaxFunctionExecutionCount =
-          ValidFunctions[StartIndex]->getKnownExecutionCount();
-      const size_t MinFunctionExecutionCount =
-          ValidFunctions[EndIndex - 1]->getKnownExecutionCount();
-      BC.outs() << format("----------------\n|   Bucket %zu:  "
-                          "|\n----------------\n",
-                          BucketIndex + 1)
-                << format(
-                       "execution counts of the %zu functions in the bucket: "
-                       "%zu-%zu\n",
-                       EndIndex - StartIndex, MinFunctionExecutionCount,
-                       MaxFunctionExecutionCount);
-      printCFGContinuityStats(BC.outs(), Functions);
-    }
-  }
-}
-} // namespace
-
-bool PrintContinuityStats::shouldOptimize(const BinaryFunction &BF) const {
-  if (BF.empty() || !BF.hasValidProfile())
-    return false;
-
-  return BinaryFunctionPass::shouldOptimize(BF);
-}
-
-Error PrintContinuityStats::runOnFunctions(BinaryContext &BC) {
-  // Create a list of functions with valid profiles.
-  FunctionListType ValidFunctions;
-  for (const auto &BFI : BC.getBinaryFunctions()) {
-    const BinaryFunction *Function = &BFI.second;
-    if (PrintContinuityStats::shouldOptimize(*Function))
-      ValidFunctions.push_back(Function);
-  }
-  if (ValidFunctions.empty() || opts::NumFunctionsForContinuityCheck == 0)
-    return Error::success();
-
-  printAll(BC, ValidFunctions, opts::NumFunctionsForContinuityCheck);
-  return Error::success();
-}
diff --git a/bolt/lib/Passes/ProfileQualityStats.cpp b/bolt/lib/Passes/ProfileQualityStats.cpp
new file mode 100644
index 0000000000000..a42b4dc27947a
--- /dev/null
+++ b/bolt/lib/Passes/ProfileQualityStats.cpp
@@ -0,0 +1,579 @@
+//===- bolt/Passes/ProfileQualityStats.cpp ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the profile quality stats calculation pass.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/ProfileQualityStats.h"
+#include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include "llvm/Support/CommandLine.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+#define DEBUG_TYPE "bolt-opts"
+
+using namespace llvm;
+using namespace bolt;
+
+namespace opts {
+extern cl::opt<unsigned> Verbosity;
+cl::opt<unsigned> NumFunctionsForProfileQualityCheck(
+    "num-functions-for-profile-quality-check",
+    cl::desc("number of hottest functions to print aggregated "
+             "profile quality stats of."),
+    cl::init(1000), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+cl::opt<unsigned> PercentileForProfileQualityCheck(
+    "percentile-for-profile-quality-check",
+    cl::desc("Percentile of profile quality distributions over hottest "
+             "functions to display."),
+    cl::init(95), cl::ZeroOrMore, cl::Hidden, cl::cat(BoltOptCategory));
+} // namespace opts
+
+namespace {
+using FunctionListType = std::vector<const BinaryFunction *>;
+using function_iterator = FunctionListType::iterator;
+
+// BB index -> flow count
+using FlowMapTy = std::unordered_map<unsigned, uint64_t>;
+// Function number -> FlowMapTy
+using TotalFlowMapTy = std::unordered_map<uint64_t, FlowMapTy>;
+// Function number -> flow count
+using FunctionFlowMapTy = std::unordered_map<uint64_t, uint64_t>;
+struct FlowInfo {
+  TotalFlowMapTy TotalIncomingMaps;
+  TotalFlowMapTy TotalOutgoingMaps;
+  TotalFlowMapTy TotalMaxCountMaps;
+  TotalFlowMapTy TotalMinCountMaps;
+  FunctionFlowMapTy CallGraphIncomingMap;
+};
+
+template <typename T>
+void printDistribution(r...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/127954


More information about the llvm-commits mailing list