<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:DengXian;

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"\@DengXian";

        panose-1:2 1 6 0 3 1 1 1 1 1;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

span.EmailStyle19

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">

<div class="WordSection1">

<p class="MsoNormal">Hi Philip,<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Thanks for the suggestion. The resubmission came with a fix for the first break as I commented here: https://reviews.llvm.org/rGb00fc198224e . Sorry for not updating the commit message. I will do that next time.<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">The second break was due to a different issue. It’s being addressed here:

<a href="https://reviews.llvm.org/D109860">https://reviews.llvm.org/D109860</a>. Once the fix is ready, I’ll update the commit message with both fixes before resubmission.<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Thanks,<o:p></o:p></p>

<p class="MsoNormal">Hongtao<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">

<b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Philip Reames <listmail@philipreames.com><br>

<b>Date: </b>Tuesday, November 23, 2021 at 12:22 PM<br>

<b>To: </b>Hongtao Yu <hoy@fb.com>, Hongtao Yu <llvmlistbot@llvm.org>, llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org><br>

<b>Subject: </b>Re: [llvm] 884b6dd - profi - a flow-based profile inference algorithm: Part I (out of 3)<o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:.5in">This appears to have been resubmitted after a revert for breaking the

<br>

build, while still breaking the build and without any discussion of <br>

fixes between the two versions.  Please do not do this!<br>

<br>

If you think you have fixed an issue causing a revert, you *must* <br>

describe in the submission comment what the problem was and how you <br>

fixed it.<br>

<br>

Philip<br>

<br>

On 11/23/21 11:04 AM, Hongtao Yu via llvm-commits wrote:<br>

> Author: spupyrev<br>

> Date: 2021-11-23T11:02:40-08:00<br>

> New Revision: 884b6dd311422bbfac62b8a90fbfff8e77ba8121<br>

><br>

> URL: <a href="https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121">https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121</a><br>

> DIFF: <a href="https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121.diff">https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121.diff</a><br>

><br>

> LOG: profi - a flow-based profile inference algorithm: Part I (out of 3)<br>

><br>

> The benefits of sampling-based PGO crucially depends on the quality of profile<br>

> data. This diff implements a flow-based algorithm, called profi, that helps to<br>

> overcome the inaccuracies in a profile after it is collected.<br>

><br>

> Profi is an extended and significantly re-engineered classic MCMF (min-cost<br>

> max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing<br>

> missing and inaccurate profiling using a minimum cost circulation algorithm]. It<br>

> models profile inference as an optimization problem on a control-flow graph with<br>

> the objectives and constraints capturing the desired properties of profile data.<br>

> Three important challenges that are being solved by profi:<br>

> - "fixing" errors in profiles caused by sampling;<br>

> - converting basic block counts to edge frequencies (branch probabilities);<br>

> - dealing with "dangling" blocks having no samples in the profile.<br>

><br>

> The main implementation (and required docs) are in SampleProfileInference.cpp.<br>

> The worst-time complexity is quadratic in the number of blocks in a function,<br>

> O(|V|^2). However a careful engineering and extensive evaluation shows that<br>

> the running time is (slightly) super-linear. In particular, instances with<br>

> 1000 blocks are solved within 0.1 second.<br>

><br>

> The algorithm has been extensively tested internally on prod workloads,<br>

> significantly improving the quality of generated profile data and providing<br>

> speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it<br>

> generally improves the performance (with a few outliers) but extra work in<br>

> the compiler might be needed to re-tune existing optimization passes relying on<br>

> profile counts.<br>

><br>

> Reviewed By: wenlei, hoy<br>

><br>

> Differential Revision: <a href="https://reviews.llvm.org/D109860">https://reviews.llvm.org/D109860</a>

<br>

><br>

> Added:<br>

>      llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>

>      llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>

>      llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>

>      llvm/test/Transforms/SampleProfile/profile-inference.ll<br>

><br>

> Modified:<br>

>      llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>

>      llvm/lib/Transforms/IPO/SampleProfile.cpp<br>

>      llvm/lib/Transforms/Utils/CMakeLists.txt<br>

>      llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>

><br>

> Removed:<br>

>      <br>

><br>

><br>

> ################################################################################<br>

> diff  --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>

> new file mode 100644<br>

> index 0000000000000..e1f681bbd3677<br>

> --- /dev/null<br>

> +++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>

> @@ -0,0 +1,284 @@<br>

> +//===- Transforms/Utils/SampleProfileInference.h ----------*- C++ -*-===//<br>

> +//<br>

> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>

> +// See <a href="https://llvm.org/LICENSE.txt">https://llvm.org/LICENSE.txt</a>  for license information.<br>

> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>

> +//<br>

> +//===----------------------------------------------------------------------===//<br>

> +//<br>

> +/// \file<br>

> +/// This file provides the interface for the profile inference algorithm, profi.<br>

> +//<br>

> +//===----------------------------------------------------------------------===//<br>

> +<br>

> +#ifndef LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>

> +#define LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>

> +<br>

> +#include "llvm/ADT/DenseMap.h"<br>

> +#include "llvm/ADT/DepthFirstIterator.h"<br>

> +#include "llvm/ADT/SmallVector.h"<br>

> +<br>

> +#include "llvm/IR/BasicBlock.h"<br>

> +#include "llvm/IR/Instruction.h"<br>

> +#include "llvm/IR/Instructions.h"<br>

> +<br>

> +namespace llvm {<br>

> +<br>

> +class BasicBlock;<br>

> +class Function;<br>

> +class MachineBasicBlock;<br>

> +class MachineFunction;<br>

> +<br>

> +namespace afdo_detail {<br>

> +<br>

> +template <class BlockT> struct TypeMap {};<br>

> +template <> struct TypeMap<BasicBlock> {<br>

> +  using BasicBlockT = BasicBlock;<br>

> +  using FunctionT = Function;<br>

> +};<br>

> +template <> struct TypeMap<MachineBasicBlock> {<br>

> +  using BasicBlockT = MachineBasicBlock;<br>

> +  using FunctionT = MachineFunction;<br>

> +};<br>

> +<br>

> +} // end namespace afdo_detail<br>

> +<br>

> +struct FlowJump;<br>

> +<br>

> +/// A wrapper of a binary basic block.<br>

> +struct FlowBlock {<br>

> +  uint64_t Index;<br>

> +  uint64_t Weight{0};<br>

> +  bool UnknownWeight{false};<br>

> +  uint64_t Flow{0};<br>

> +  bool HasSelfEdge{false};<br>

> +  std::vector<FlowJump *> SuccJumps;<br>

> +  std::vector<FlowJump *> PredJumps;<br>

> +<br>

> +  /// Check if it is the entry block in the function.<br>

> +  bool isEntry() const { return PredJumps.empty(); }<br>

> +<br>

> +  /// Check if it is an exit block in the function.<br>

> +  bool isExit() const { return SuccJumps.empty(); }<br>

> +};<br>

> +<br>

> +/// A wrapper of a jump between two basic blocks.<br>

> +struct FlowJump {<br>

> +  uint64_t Source;<br>

> +  uint64_t Target;<br>

> +  uint64_t Flow{0};<br>

> +  bool IsUnlikely{false};<br>

> +};<br>

> +<br>

> +/// A wrapper of binary function with basic blocks and jumps.<br>

> +struct FlowFunction {<br>

> +  std::vector<FlowBlock> Blocks;<br>

> +  std::vector<FlowJump> Jumps;<br>

> +  /// The index of the entry block.<br>

> +  uint64_t Entry;<br>

> +};<br>

> +<br>

> +void applyFlowInference(FlowFunction &Func);<br>

> +<br>

> +/// Sample profile inference pass.<br>

> +template <typename BT> class SampleProfileInference {<br>

> +public:<br>

> +  using BasicBlockT = typename afdo_detail::TypeMap<BT>::BasicBlockT;<br>

> +  using FunctionT = typename afdo_detail::TypeMap<BT>::FunctionT;<br>

> +  using Edge = std::pair<const BasicBlockT *, const BasicBlockT *>;<br>

> +  using BlockWeightMap = DenseMap<const BasicBlockT *, uint64_t>;<br>

> +  using EdgeWeightMap = DenseMap<Edge, uint64_t>;<br>

> +  using BlockEdgeMap =<br>

> +      DenseMap<const BasicBlockT *, SmallVector<const BasicBlockT *, 8>>;<br>

> +<br>

> +  SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,<br>

> +                         BlockWeightMap &SampleBlockWeights)<br>

> +      : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights) {}<br>

> +<br>

> +  /// Apply the profile inference algorithm for a given function<br>

> +  void apply(BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);<br>

> +<br>

> +private:<br>

> +  /// Try to infer branch probabilities mimicking implementation of<br>

> +  /// BranchProbabilityInfo. Unlikely taken branches are marked so that the<br>

> +  /// inference algorithm can avoid sending flow along corresponding edges.<br>

> +  void findUnlikelyJumps(const std::vector<const BasicBlockT *> &BasicBlocks,<br>

> +                         BlockEdgeMap &Successors, FlowFunction &Func);<br>

> +<br>

> +  /// Determine whether the block is an exit in the CFG.<br>

> +  bool isExit(const BasicBlockT *BB);<br>

> +<br>

> +  /// Function.<br>

> +  const FunctionT &F;<br>

> +<br>

> +  /// Successors for each basic block in the CFG.<br>

> +  BlockEdgeMap &Successors;<br>

> +<br>

> +  /// Map basic blocks to their sampled weights.<br>

> +  BlockWeightMap &SampleBlockWeights;<br>

> +};<br>

> +<br>

> +template <typename BT><br>

> +void SampleProfileInference<BT>::apply(BlockWeightMap &BlockWeights,<br>

> +                                       EdgeWeightMap &EdgeWeights) {<br>

> +  // Find all forwards reachable blocks which the inference algorithm will be<br>

> +  // applied on.<br>

> +  df_iterator_default_set<const BasicBlockT *> Reachable;<br>

> +  for (auto *BB : depth_first_ext(&F, Reachable))<br>

> +    (void)BB /* Mark all reachable blocks */;<br>

> +<br>

> +  // Find all backwards reachable blocks which the inference algorithm will be<br>

> +  // applied on.<br>

> +  df_iterator_default_set<const BasicBlockT *> InverseReachable;<br>

> +  for (const auto &BB : F) {<br>

> +    // An exit block is a block without any successors.<br>

> +    if (isExit(&BB)) {<br>

> +      for (auto *RBB : inverse_depth_first_ext(&BB, InverseReachable))<br>

> +        (void)RBB;<br>

> +    }<br>

> +  }<br>

> +<br>

> +  // Keep a stable order for reachable blocks<br>

> +  DenseMap<const BasicBlockT *, uint64_t> BlockIndex;<br>

> +  std::vector<const BasicBlockT *> BasicBlocks;<br>

> +  BlockIndex.reserve(Reachable.size());<br>

> +  BasicBlocks.reserve(Reachable.size());<br>

> +  for (const auto &BB : F) {<br>

> +    if (Reachable.count(&BB) && InverseReachable.count(&BB)) {<br>

> +      BlockIndex[&BB] = BasicBlocks.size();<br>

> +      BasicBlocks.push_back(&BB);<br>

> +    }<br>

> +  }<br>

> +<br>

> +  BlockWeights.clear();<br>

> +  EdgeWeights.clear();<br>

> +  bool HasSamples = false;<br>

> +  for (const auto *BB : BasicBlocks) {<br>

> +    auto It = SampleBlockWeights.find(BB);<br>

> +    if (It != SampleBlockWeights.end() && It->second > 0) {<br>

> +      HasSamples = true;<br>

> +      BlockWeights[BB] = It->second;<br>

> +    }<br>

> +  }<br>

> +  // Quit early for functions with a single block or ones w/o samples<br>

> +  if (BasicBlocks.size() <= 1 || !HasSamples) {<br>

> +    return;<br>

> +  }<br>

> +<br>

> +  // Create necessary objects<br>

> +  FlowFunction Func;<br>

> +  Func.Blocks.reserve(BasicBlocks.size());<br>

> +  // Create FlowBlocks<br>

> +  for (const auto *BB : BasicBlocks) {<br>

> +    FlowBlock Block;<br>

> +    if (SampleBlockWeights.find(BB) != SampleBlockWeights.end()) {<br>

> +      Block.UnknownWeight = false;<br>

> +      Block.Weight = SampleBlockWeights[BB];<br>

> +    } else {<br>

> +      Block.UnknownWeight = true;<br>

> +      Block.Weight = 0;<br>

> +    }<br>

> +    Block.Index = Func.Blocks.size();<br>

> +    Func.Blocks.push_back(Block);<br>

> +  }<br>

> +  // Create FlowEdges<br>

> +  for (const auto *BB : BasicBlocks) {<br>

> +    for (auto *Succ : Successors[BB]) {<br>

> +      if (!BlockIndex.count(Succ))<br>

> +        continue;<br>

> +      FlowJump Jump;<br>

> +      Jump.Source = BlockIndex[BB];<br>

> +      Jump.Target = BlockIndex[Succ];<br>

> +      Func.Jumps.push_back(Jump);<br>

> +      if (BB == Succ) {<br>

> +        Func.Blocks[BlockIndex[BB]].HasSelfEdge = true;<br>

> +      }<br>

> +    }<br>

> +  }<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    Func.Blocks[Jump.Source].SuccJumps.push_back(&Jump);<br>

> +    Func.Blocks[Jump.Target].PredJumps.push_back(&Jump);<br>

> +  }<br>

> +<br>

> +  // Try to infer probabilities of jumps based on the content of basic block<br>

> +  findUnlikelyJumps(BasicBlocks, Successors, Func);<br>

> +<br>

> +  // Find the entry block<br>

> +  for (size_t I = 0; I < Func.Blocks.size(); I++) {<br>

> +    if (Func.Blocks[I].isEntry()) {<br>

> +      Func.Entry = I;<br>

> +      break;<br>

> +    }<br>

> +  }<br>

> +<br>

> +  // Create and apply the inference network model.<br>

> +  applyFlowInference(Func);<br>

> +<br>

> +  // Extract the resulting weights from the control flow<br>

> +  // All weights are increased by one to avoid propagation errors introduced by<br>

> +  // zero weights.<br>

> +  for (const auto *BB : BasicBlocks) {<br>

> +    BlockWeights[BB] = Func.Blocks[BlockIndex[BB]].Flow;<br>

> +  }<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    Edge E = std::make_pair(BasicBlocks[Jump.Source], BasicBlocks[Jump.Target]);<br>

> +    EdgeWeights[E] = Jump.Flow;<br>

> +  }<br>

> +<br>

> +#ifndef NDEBUG<br>

> +  // Unreachable blocks and edges should not have a weight.<br>

> +  for (auto &I : BlockWeights) {<br>

> +    assert(Reachable.contains(I.first));<br>

> +    assert(InverseReachable.contains(I.first));<br>

> +  }<br>

> +  for (auto &I : EdgeWeights) {<br>

> +    assert(Reachable.contains(I.first.first) &&<br>

> +           Reachable.contains(I.first.second));<br>

> +    assert(InverseReachable.contains(I.first.first) &&<br>

> +           InverseReachable.contains(I.first.second));<br>

> +  }<br>

> +#endif<br>

> +}<br>

> +<br>

> +template <typename BT><br>

> +inline void SampleProfileInference<BT>::findUnlikelyJumps(<br>

> +    const std::vector<const BasicBlockT *> &BasicBlocks,<br>

> +    BlockEdgeMap &Successors, FlowFunction &Func) {}<br>

> +<br>

> +template <><br>

> +inline void SampleProfileInference<BasicBlock>::findUnlikelyJumps(<br>

> +    const std::vector<const BasicBlockT *> &BasicBlocks,<br>

> +    BlockEdgeMap &Successors, FlowFunction &Func) {<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    const auto *BB = BasicBlocks[Jump.Source];<br>

> +    const auto *Succ = BasicBlocks[Jump.Target];<br>

> +    const Instruction *TI = BB->getTerminator();<br>

> +    // Check if a block ends with InvokeInst and mark non-taken branch unlikely.<br>

> +    // In that case block Succ should be a landing pad<br>

> +    if (Successors[BB].size() == 2 && Successors[BB].back() == Succ) {<br>

> +      if (isa<InvokeInst>(TI)) {<br>

> +        Jump.IsUnlikely = true;<br>

> +      }<br>

> +    }<br>

> +    const Instruction *SuccTI = Succ->getTerminator();<br>

> +    // Check if the target block contains UnreachableInst and mark it unlikely<br>

> +    if (SuccTI->getNumSuccessors() == 0) {<br>

> +      if (isa<UnreachableInst>(SuccTI)) {<br>

> +        Jump.IsUnlikely = true;<br>

> +      }<br>

> +    }<br>

> +  }<br>

> +}<br>

> +<br>

> +template <typename BT><br>

> +inline bool SampleProfileInference<BT>::isExit(const BasicBlockT *BB) {<br>

> +  return BB->succ_empty();<br>

> +}<br>

> +<br>

> +template <><br>

> +inline bool SampleProfileInference<BasicBlock>::isExit(const BasicBlock *BB) {<br>

> +  return succ_empty(BB);<br>

> +}<br>

> +<br>

> +} // end namespace llvm<br>

> +#endif // LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>

><br>

> diff  --git a/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h b/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>

> index 6a2f0acf46f32..e9b3d5aef15fb 100644<br>

> --- a/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>

> +++ b/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>

> @@ -38,6 +38,7 @@<br>

>   #include "llvm/Support/CommandLine.h"<br>

>   #include "llvm/Support/GenericDomTree.h"<br>

>   #include "llvm/Support/raw_ostream.h"<br>

> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>

>   #include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"<br>

>   <br>

>   namespace llvm {<br>

> @@ -74,6 +75,8 @@ template <> struct IRTraits<BasicBlock> {<br>

>   <br>

>   } // end namespace afdo_detail<br>

>   <br>

> +extern cl::opt<unsigned> SampleProfileUseProfi;<br>

> +<br>

>   template <typename BT> class SampleProfileLoaderBaseImpl {<br>

>   public:<br>

>     SampleProfileLoaderBaseImpl(std::string Name, std::string RemapName)<br>

> @@ -142,6 +145,9 @@ template <typename BT> class SampleProfileLoaderBaseImpl {<br>

>                              ArrayRef<BasicBlockT *> Descendants,<br>

>                              PostDominatorTreeT *DomTree);<br>

>     void propagateWeights(FunctionT &F);<br>

> +  void applyProfi(FunctionT &F, BlockEdgeMap &Successors,<br>

> +                  BlockWeightMap &SampleBlockWeights,<br>

> +                  BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);<br>

>     uint64_t visitEdge(Edge E, unsigned *NumUnknownEdges, Edge *UnknownEdge);<br>

>     void buildEdges(FunctionT &F);<br>

>     bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);<br>

> @@ -150,6 +156,11 @@ template <typename BT> class SampleProfileLoaderBaseImpl {<br>

>     bool<br>

>     computeAndPropagateWeights(FunctionT &F,<br>

>                                const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>

> +  void initWeightPropagation(FunctionT &F,<br>

> +                             const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>

> +  void<br>

> +  finalizeWeightPropagation(FunctionT &F,<br>

> +                            const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>

>     void emitCoverageRemarks(FunctionT &F);<br>

>   <br>

>     /// Map basic blocks to their computed weights.<br>

> @@ -741,50 +752,65 @@ void SampleProfileLoaderBaseImpl<BT>::buildEdges(FunctionT &F) {<br>

>   ///   known).<br>

>   template <typename BT><br>

>   void SampleProfileLoaderBaseImpl<BT>::propagateWeights(FunctionT &F) {<br>

> -  bool Changed = true;<br>

> -  unsigned I = 0;<br>

> -<br>

> -  // If BB weight is larger than its corresponding loop's header BB weight,<br>

> -  // use the BB weight to replace the loop header BB weight.<br>

> -  for (auto &BI : F) {<br>

> -    BasicBlockT *BB = &BI;<br>

> -    LoopT *L = LI->getLoopFor(BB);<br>

> -    if (!L) {<br>

> -      continue;<br>

> +  // Flow-based profile inference is only usable with BasicBlock instantiation<br>

> +  // of SampleProfileLoaderBaseImpl.<br>

> +  if (SampleProfileUseProfi) {<br>

> +    // Prepare block sample counts for inference.<br>

> +    BlockWeightMap SampleBlockWeights;<br>

> +    for (const auto &BI : F) {<br>

> +      ErrorOr<uint64_t> Weight = getBlockWeight(&BI);<br>

> +      if (Weight)<br>

> +        SampleBlockWeights[&BI] = Weight.get();<br>

>       }<br>

> -    BasicBlockT *Header = L->getHeader();<br>

> -    if (Header && BlockWeights[BB] > BlockWeights[Header]) {<br>

> -      BlockWeights[Header] = BlockWeights[BB];<br>

> +    // Fill in BlockWeights and EdgeWeights using an inference algorithm.<br>

> +    applyProfi(F, Successors, SampleBlockWeights, BlockWeights, EdgeWeights);<br>

> +  } else {<br>

> +    bool Changed = true;<br>

> +    unsigned I = 0;<br>

> +<br>

> +    // If BB weight is larger than its corresponding loop's header BB weight,<br>

> +    // use the BB weight to replace the loop header BB weight.<br>

> +    for (auto &BI : F) {<br>

> +      BasicBlockT *BB = &BI;<br>

> +      LoopT *L = LI->getLoopFor(BB);<br>

> +      if (!L) {<br>

> +        continue;<br>

> +      }<br>

> +      BasicBlockT *Header = L->getHeader();<br>

> +      if (Header && BlockWeights[BB] > BlockWeights[Header]) {<br>

> +        BlockWeights[Header] = BlockWeights[BB];<br>

> +      }<br>

>       }<br>

> -  }<br>

>   <br>

> -  // Before propagation starts, build, for each block, a list of<br>

> -  // unique predecessors and successors. This is necessary to handle<br>

> -  // identical edges in multiway branches. Since we visit all blocks and all<br>

> -  // edges of the CFG, it is cleaner to build these lists once at the start<br>

> -  // of the pass.<br>

> -  buildEdges(F);<br>

> +    // Propagate until we converge or we go past the iteration limit.<br>

> +    while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> +      Changed = propagateThroughEdges(F, false);<br>

> +    }<br>

>   <br>

> -  // Propagate until we converge or we go past the iteration limit.<br>

> -  while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> -    Changed = propagateThroughEdges(F, false);<br>

> -  }<br>

> +    // The first propagation propagates BB counts from annotated BBs to unknown<br>

> +    // BBs. The 2nd propagation pass resets edges weights, and use all BB<br>

> +    // weights to propagate edge weights.<br>

> +    VisitedEdges.clear();<br>

> +    Changed = true;<br>

> +    while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> +      Changed = propagateThroughEdges(F, false);<br>

> +    }<br>

>   <br>

> -  // The first propagation propagates BB counts from annotated BBs to unknown<br>

> -  // BBs. The 2nd propagation pass resets edges weights, and use all BB weights<br>

> -  // to propagate edge weights.<br>

> -  VisitedEdges.clear();<br>

> -  Changed = true;<br>

> -  while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> -    Changed = propagateThroughEdges(F, false);<br>

> +    // The 3rd propagation pass allows adjust annotated BB weights that are<br>

> +    // obviously wrong.<br>

> +    Changed = true;<br>

> +    while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> +      Changed = propagateThroughEdges(F, true);<br>

> +    }<br>

>     }<br>

> +}<br>

>   <br>

> -  // The 3rd propagation pass allows adjust annotated BB weights that are<br>

> -  // obviously wrong.<br>

> -  Changed = true;<br>

> -  while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>

> -    Changed = propagateThroughEdges(F, true);<br>

> -  }<br>

> +template <typename BT><br>

> +void SampleProfileLoaderBaseImpl<BT>::applyProfi(<br>

> +    FunctionT &F, BlockEdgeMap &Successors, BlockWeightMap &SampleBlockWeights,<br>

> +    BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights) {<br>

> +  auto Infer = SampleProfileInference<BT>(F, Successors, SampleBlockWeights);<br>

> +  Infer.apply(BlockWeights, EdgeWeights);<br>

>   }<br>

>   <br>

>   /// Generate branch weight metadata for all branches in \p F.<br>

> @@ -842,26 +868,64 @@ bool SampleProfileLoaderBaseImpl<BT>::computeAndPropagateWeights(<br>

>     Changed |= computeBlockWeights(F);<br>

>   <br>

>     if (Changed) {<br>

> -    // Add an entry count to the function using the samples gathered at the<br>

> -    // function entry.<br>

> -    // Sets the GUIDs that are inlined in the profiled binary. This is used<br>

> -    // for ThinLink to make correct liveness analysis, and also make the IR<br>

> -    // match the profiled binary before annotation.<br>

> -    getFunction(F).setEntryCount(<br>

> -        ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),<br>

> -        &InlinedGUIDs);<br>

> +    // Initialize propagation.<br>

> +    initWeightPropagation(F, InlinedGUIDs);<br>

>   <br>

> +    // Propagate weights to all edges.<br>

> +    propagateWeights(F);<br>

> +<br>

> +    // Post-process propagated weights.<br>

> +    finalizeWeightPropagation(F, InlinedGUIDs);<br>

> +  }<br>

> +<br>

> +  return Changed;<br>

> +}<br>

> +<br>

> +template <typename BT><br>

> +void SampleProfileLoaderBaseImpl<BT>::initWeightPropagation(<br>

> +    FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {<br>

> +  // Add an entry count to the function using the samples gathered at the<br>

> +  // function entry.<br>

> +  // Sets the GUIDs that are inlined in the profiled binary. This is used<br>

> +  // for ThinLink to make correct liveness analysis, and also make the IR<br>

> +  // match the profiled binary before annotation.<br>

> +  getFunction(F).setEntryCount(<br>

> +      ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),<br>

> +      &InlinedGUIDs);<br>

> +<br>

> +  if (!SampleProfileUseProfi) {<br>

>       // Compute dominance and loop info needed for propagation.<br>

>       computeDominanceAndLoopInfo(F);<br>

>   <br>

>       // Find equivalence classes.<br>

>       findEquivalenceClasses(F);<br>

> -<br>

> -    // Propagate weights to all edges.<br>

> -    propagateWeights(F);<br>

>     }<br>

>   <br>

> -  return Changed;<br>

> +  // Before propagation starts, build, for each block, a list of<br>

> +  // unique predecessors and successors. This is necessary to handle<br>

> +  // identical edges in multiway branches. Since we visit all blocks and all<br>

> +  // edges of the CFG, it is cleaner to build these lists once at the start<br>

> +  // of the pass.<br>

> +  buildEdges(F);<br>

> +}<br>

> +<br>

> +template <typename BT><br>

> +void SampleProfileLoaderBaseImpl<BT>::finalizeWeightPropagation(<br>

> +    FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {<br>

> +  // If we utilize a flow-based count inference, then we trust the computed<br>

> +  // counts and set the entry count as computed by the algorithm. This is<br>

> +  // primarily done to sync the counts produced by profi and BFI inference,<br>

> +  // which uses the entry count for mass propagation.<br>

> +  // If profi produces a zero-value for the entry count, we fallback to<br>

> +  // Samples->getHeadSamples() + 1 to avoid functions with zero count.<br>

> +  if (SampleProfileUseProfi) {<br>

> +    const BasicBlockT *EntryBB = getEntryBB(&F);<br>

> +    if (BlockWeights[EntryBB] > 0) {<br>

> +      getFunction(F).setEntryCount(<br>

> +          ProfileCount(BlockWeights[EntryBB], Function::PCT_Real),<br>

> +          &InlinedGUIDs);<br>

> +    }<br>

> +  }<br>

>   }<br>

>   <br>

>   template <typename BT><br>

><br>

> diff  --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>

> index a961c47a75013..3e01fd17f5260 100644<br>

> --- a/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>

> +++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>

> @@ -84,6 +84,7 @@<br>

>   #include "llvm/Transforms/Instrumentation.h"<br>

>   #include "llvm/Transforms/Utils/CallPromotionUtils.h"<br>

>   #include "llvm/Transforms/Utils/Cloning.h"<br>

> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>

>   #include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"<br>

>   #include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"<br>

>   #include <algorithm><br>

> @@ -1648,6 +1649,19 @@ void SampleProfileLoader::generateMDProfMetadata(Function &F) {<br>

>       SmallVector<uint32_t, 4> Weights;<br>

>       uint32_t MaxWeight = 0;<br>

>       Instruction *MaxDestInst;<br>

> +    // Since profi treats multiple edges (multiway branches) as a single edge,<br>

> +    // we need to distribute the computed weight among the branches. We do<br>

> +    // this by evenly splitting the edge weight among destinations.<br>

> +    DenseMap<const BasicBlock *, uint64_t> EdgeMultiplicity;<br>

> +    std::vector<uint64_t> EdgeIndex;<br>

> +    if (SampleProfileUseProfi) {<br>

> +      EdgeIndex.resize(TI->getNumSuccessors());<br>

> +      for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {<br>

> +        const BasicBlock *Succ = TI->getSuccessor(I);<br>

> +        EdgeIndex[I] = EdgeMultiplicity[Succ];<br>

> +        EdgeMultiplicity[Succ]++;<br>

> +      }<br>

> +    }<br>

>       for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {<br>

>         BasicBlock *Succ = TI->getSuccessor(I);<br>

>         Edge E = std::make_pair(BB, Succ);<br>

> @@ -1660,9 +1674,19 @@ void SampleProfileLoader::generateMDProfMetadata(Function &F) {<br>

>           LLVM_DEBUG(dbgs() << " (saturated due to uint32_t overflow)");<br>

>           Weight = std::numeric_limits<uint32_t>::max();<br>

>         }<br>

> -      // Weight is added by one to avoid propagation errors introduced by<br>

> -      // 0 weights.<br>

> -      Weights.push_back(static_cast<uint32_t>(Weight + 1));<br>

> +      if (!SampleProfileUseProfi) {<br>

> +        // Weight is added by one to avoid propagation errors introduced by<br>

> +        // 0 weights.<br>

> +        Weights.push_back(static_cast<uint32_t>(Weight + 1));<br>

> +      } else {<br>

> +        // Profi creates proper weights that do not require "+1" adjustments but<br>

> +        // we evenly split the weight among branches with the same destination.<br>

> +        uint64_t W = Weight / EdgeMultiplicity[Succ];<br>

> +        // Rounding up, if needed, so that first branches are hotter.<br>

> +        if (EdgeIndex[I] < Weight % EdgeMultiplicity[Succ])<br>

> +          W++;<br>

> +        Weights.push_back(static_cast<uint32_t>(W));<br>

> +      }<br>

>         if (Weight != 0) {<br>

>           if (Weight > MaxWeight) {<br>

>             MaxWeight = Weight;<br>

><br>

> diff  --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt<br>

> index be4f7125eb853..22b9c0b19adab 100644<br>

> --- a/llvm/lib/Transforms/Utils/CMakeLists.txt<br>

> +++ b/llvm/lib/Transforms/Utils/CMakeLists.txt<br>

> @@ -60,6 +60,7 @@ add_llvm_component_library(LLVMTransformUtils<br>

>     StripGCRelocates.cpp<br>

>     SSAUpdater.cpp<br>

>     SSAUpdaterBulk.cpp<br>

> +  SampleProfileInference.cpp<br>

>     SampleProfileLoaderBaseUtil.cpp<br>

>     SanitizerStats.cpp<br>

>     SimplifyCFG.cpp<br>

><br>

> diff  --git a/llvm/lib/Transforms/Utils/SampleProfileInference.cpp b/llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>

> new file mode 100644<br>

> index 0000000000000..412a724006aa2<br>

> --- /dev/null<br>

> +++ b/llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>

> @@ -0,0 +1,461 @@<br>

> +//===- SampleProfileInference.cpp - Adjust sample profiles in the IR ------===//<br>

> +//<br>

> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>

> +// See <a href="https://llvm.org/LICENSE.txt">https://llvm.org/LICENSE.txt</a>  for license information.<br>

> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>

> +//<br>

> +//===----------------------------------------------------------------------===//<br>

> +//<br>

> +// This file implements a profile inference algorithm. Given an incomplete and<br>

> +// possibly imprecise block counts, the algorithm reconstructs realistic block<br>

> +// and edge counts that satisfy flow conservation rules, while minimally modify<br>

> +// input block counts.<br>

> +//<br>

> +//===----------------------------------------------------------------------===//<br>

> +<br>

> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>

> +#include "llvm/Support/Debug.h"<br>

> +#include <queue><br>

> +#include <set><br>

> +<br>

> +using namespace llvm;<br>

> +#define DEBUG_TYPE "sample-profile-inference"<br>

> +<br>

> +namespace {<br>

> +<br>

> +/// A value indicating an infinite flow/capacity/weight of a block/edge.<br>

> +/// Not using numeric_limits<int64_t>::max(), as the values can be summed up<br>

> +/// during the execution.<br>

> +static constexpr int64_t INF = ((int64_t)1) << 50;<br>

> +<br>

> +/// The minimum-cost maximum flow algorithm.<br>

> +///<br>

> +/// The algorithm finds the maximum flow of minimum cost on a given (directed)<br>

> +/// network using a modified version of the classical Moore-Bellman-Ford<br>

> +/// approach. The algorithm applies a number of augmentation iterations in which<br>

> +/// flow is sent along paths of positive capacity from the source to the sink.<br>

> +/// The worst-case time complexity of the implementation is O(v(f)*m*n), where<br>

> +/// where m is the number of edges, n is the number of vertices, and v(f) is the<br>

> +/// value of the maximum flow. However, the observed running time on typical<br>

> +/// instances is sub-quadratic, that is, o(n^2).<br>

> +///<br>

> +/// The input is a set of edges with specified costs and capacities, and a pair<br>

> +/// of nodes (source and sink). The output is the flow along each edge of the<br>

> +/// minimum total cost respecting the given edge capacities.<br>

> +class MinCostMaxFlow {<br>

> +public:<br>

> +  // Initialize algorithm's data structures for a network of a given size.<br>

> +  void initialize(uint64_t NodeCount, uint64_t SourceNode, uint64_t SinkNode) {<br>

> +    Source = SourceNode;<br>

> +    Target = SinkNode;<br>

> +<br>

> +    Nodes = std::vector<Node>(NodeCount);<br>

> +    Edges = std::vector<std::vector<Edge>>(NodeCount, std::vector<Edge>());<br>

> +  }<br>

> +<br>

> +  // Run the algorithm.<br>

> +  int64_t run() {<br>

> +    // Find an augmenting path and update the flow along the path<br>

> +    size_t AugmentationIters = 0;<br>

> +    while (findAugmentingPath()) {<br>

> +      augmentFlowAlongPath();<br>

> +      AugmentationIters++;<br>

> +    }<br>

> +<br>

> +    // Compute the total flow and its cost<br>

> +    int64_t TotalCost = 0;<br>

> +    int64_t TotalFlow = 0;<br>

> +    for (uint64_t Src = 0; Src < Nodes.size(); Src++) {<br>

> +      for (auto &Edge : Edges[Src]) {<br>

> +        if (Edge.Flow > 0) {<br>

> +          TotalCost += Edge.Cost * Edge.Flow;<br>

> +          if (Src == Source)<br>

> +            TotalFlow += Edge.Flow;<br>

> +        }<br>

> +      }<br>

> +    }<br>

> +    LLVM_DEBUG(dbgs() << "Completed profi after " << AugmentationIters<br>

> +                      << " iterations with " << TotalFlow << " total flow"<br>

> +                      << " of " << TotalCost << " cost\n");<br>

> +    return TotalCost;<br>

> +  }<br>

> +<br>

> +  /// Adding an edge to the network with a specified capacity and a cost.<br>

> +  /// Multiple edges between a pair of nodes are allowed but self-edges<br>

> +  /// are not supported.<br>

> +  void addEdge(uint64_t Src, uint64_t Dst, int64_t Capacity, int64_t Cost) {<br>

> +    assert(Capacity > 0 && "adding an edge of zero capacity");<br>

> +    assert(Src != Dst && "loop edge are not supported");<br>

> +<br>

> +    Edge SrcEdge;<br>

> +    SrcEdge.Dst = Dst;<br>

> +    SrcEdge.Cost = Cost;<br>

> +    SrcEdge.Capacity = Capacity;<br>

> +    SrcEdge.Flow = 0;<br>

> +    SrcEdge.RevEdgeIndex = Edges[Dst].size();<br>

> +<br>

> +    Edge DstEdge;<br>

> +    DstEdge.Dst = Src;<br>

> +    DstEdge.Cost = -Cost;<br>

> +    DstEdge.Capacity = 0;<br>

> +    DstEdge.Flow = 0;<br>

> +    DstEdge.RevEdgeIndex = Edges[Src].size();<br>

> +<br>

> +    Edges[Src].push_back(SrcEdge);<br>

> +    Edges[Dst].push_back(DstEdge);<br>

> +  }<br>

> +<br>

> +  /// Adding an edge to the network of infinite capacity and a given cost.<br>

> +  void addEdge(uint64_t Src, uint64_t Dst, int64_t Cost) {<br>

> +    addEdge(Src, Dst, INF, Cost);<br>

> +  }<br>

> +<br>

> +  /// Get the total flow from a given source node.<br>

> +  /// Returns a list of pairs (target node, amount of flow to the target).<br>

> +  const std::vector<std::pair<uint64_t, int64_t>> getFlow(uint64_t Src) const {<br>

> +    std::vector<std::pair<uint64_t, int64_t>> Flow;<br>

> +    for (auto &Edge : Edges[Src]) {<br>

> +      if (Edge.Flow > 0)<br>

> +        Flow.push_back(std::make_pair(Edge.Dst, Edge.Flow));<br>

> +    }<br>

> +    return Flow;<br>

> +  }<br>

> +<br>

> +  /// Get the total flow between a pair of nodes.<br>

> +  int64_t getFlow(uint64_t Src, uint64_t Dst) const {<br>

> +    int64_t Flow = 0;<br>

> +    for (auto &Edge : Edges[Src]) {<br>

> +      if (Edge.Dst == Dst) {<br>

> +        Flow += Edge.Flow;<br>

> +      }<br>

> +    }<br>

> +    return Flow;<br>

> +  }<br>

> +<br>

> +  /// A cost of increasing a block's count by one.<br>

> +  static constexpr int64_t AuxCostInc = 10;<br>

> +  /// A cost of decreasing a block's count by one.<br>

> +  static constexpr int64_t AuxCostDec = 20;<br>

> +  /// A cost of increasing a count of zero-weight block by one.<br>

> +  static constexpr int64_t AuxCostIncZero = 11;<br>

> +  /// A cost of increasing the entry block's count by one.<br>

> +  static constexpr int64_t AuxCostIncEntry = 40;<br>

> +  /// A cost of decreasing the entry block's count by one.<br>

> +  static constexpr int64_t AuxCostDecEntry = 10;<br>

> +  /// A cost of taking an unlikely jump.<br>

> +  static constexpr int64_t AuxCostUnlikely = ((int64_t)1) << 20;<br>

> +<br>

> +private:<br>

> +  /// Check for existence of an augmenting path with a positive capacity.<br>

> +  bool findAugmentingPath() {<br>

> +    // Initialize data structures<br>

> +    for (auto &Node : Nodes) {<br>

> +      Node.Distance = INF;<br>

> +      Node.ParentNode = uint64_t(-1);<br>

> +      Node.ParentEdgeIndex = uint64_t(-1);<br>

> +      Node.Taken = false;<br>

> +    }<br>

> +<br>

> +    std::queue<uint64_t> Queue;<br>

> +    Queue.push(Source);<br>

> +    Nodes[Source].Distance = 0;<br>

> +    Nodes[Source].Taken = true;<br>

> +    while (!Queue.empty()) {<br>

> +      uint64_t Src = Queue.front();<br>

> +      Queue.pop();<br>

> +      Nodes[Src].Taken = false;<br>

> +      // Although the residual network contains edges with negative costs<br>

> +      // (in particular, backward edges), it can be shown that there are no<br>

> +      // negative-weight cycles and the following two invariants are maintained:<br>

> +      // (i) Dist[Source, V] >= 0 and (ii) Dist[V, Target] >= 0 for all nodes V,<br>

> +      // where Dist is the length of the shortest path between two nodes. This<br>

> +      // allows to prune the search-space of the path-finding algorithm using<br>

> +      // the following early-stop criteria:<br>

> +      // -- If we find a path with zero-distance from Source to Target, stop the<br>

> +      //    search, as the path is the shortest since Dist[Source, Target] >= 0;<br>

> +      // -- If we have Dist[Source, V] > Dist[Source, Target], then do not<br>

> +      //    process node V, as it is guaranteed _not_ to be on a shortest path<br>

> +      //    from Source to Target; it follows from inequalities<br>

> +      //    Dist[Source, Target] >= Dist[Source, V] + Dist[V, Target]<br>

> +      //                         >= Dist[Source, V]<br>

> +      if (Nodes[Target].Distance == 0)<br>

> +        break;<br>

> +      if (Nodes[Src].Distance > Nodes[Target].Distance)<br>

> +        continue;<br>

> +<br>

> +      // Process adjacent edges<br>

> +      for (uint64_t EdgeIdx = 0; EdgeIdx < Edges[Src].size(); EdgeIdx++) {<br>

> +        auto &Edge = Edges[Src][EdgeIdx];<br>

> +        if (Edge.Flow < Edge.Capacity) {<br>

> +          uint64_t Dst = Edge.Dst;<br>

> +          int64_t NewDistance = Nodes[Src].Distance + Edge.Cost;<br>

> +          if (Nodes[Dst].Distance > NewDistance) {<br>

> +            // Update the distance and the parent node/edge<br>

> +            Nodes[Dst].Distance = NewDistance;<br>

> +            Nodes[Dst].ParentNode = Src;<br>

> +            Nodes[Dst].ParentEdgeIndex = EdgeIdx;<br>

> +            // Add the node to the queue, if it is not there yet<br>

> +            if (!Nodes[Dst].Taken) {<br>

> +              Queue.push(Dst);<br>

> +              Nodes[Dst].Taken = true;<br>

> +            }<br>

> +          }<br>

> +        }<br>

> +      }<br>

> +    }<br>

> +<br>

> +    return Nodes[Target].Distance != INF;<br>

> +  }<br>

> +<br>

> +  /// Update the current flow along the augmenting path.<br>

> +  void augmentFlowAlongPath() {<br>

> +    // Find path capacity<br>

> +    int64_t PathCapacity = INF;<br>

> +    uint64_t Now = Target;<br>

> +    while (Now != Source) {<br>

> +      uint64_t Pred = Nodes[Now].ParentNode;<br>

> +      auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];<br>

> +      PathCapacity = std::min(PathCapacity, Edge.Capacity - Edge.Flow);<br>

> +      Now = Pred;<br>

> +    }<br>

> +<br>

> +    assert(PathCapacity > 0 && "found incorrect augmenting path");<br>

> +<br>

> +    // Update the flow along the path<br>

> +    Now = Target;<br>

> +    while (Now != Source) {<br>

> +      uint64_t Pred = Nodes[Now].ParentNode;<br>

> +      auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];<br>

> +      auto &RevEdge = Edges[Now][Edge.RevEdgeIndex];<br>

> +<br>

> +      Edge.Flow += PathCapacity;<br>

> +      RevEdge.Flow -= PathCapacity;<br>

> +<br>

> +      Now = Pred;<br>

> +    }<br>

> +  }<br>

> +<br>

> +  /// An node in a flow network.<br>

> +  struct Node {<br>

> +    /// The cost of the cheapest path from the source to the current node.<br>

> +    int64_t Distance;<br>

> +    /// The node preceding the current one in the path.<br>

> +    uint64_t ParentNode;<br>

> +    /// The index of the edge between ParentNode and the current node.<br>

> +    uint64_t ParentEdgeIndex;<br>

> +    /// An indicator of whether the current node is in a queue.<br>

> +    bool Taken;<br>

> +  };<br>

> +  /// An edge in a flow network.<br>

> +  struct Edge {<br>

> +    /// The cost of the edge.<br>

> +    int64_t Cost;<br>

> +    /// The capacity of the edge.<br>

> +    int64_t Capacity;<br>

> +    /// The current flow on the edge.<br>

> +    int64_t Flow;<br>

> +    /// The destination node of the edge.<br>

> +    uint64_t Dst;<br>

> +    /// The index of the reverse edge between Dst and the current node.<br>

> +    uint64_t RevEdgeIndex;<br>

> +  };<br>

> +<br>

> +  /// The set of network nodes.<br>

> +  std::vector<Node> Nodes;<br>

> +  /// The set of network edges.<br>

> +  std::vector<std::vector<Edge>> Edges;<br>

> +  /// Source node of the flow.<br>

> +  uint64_t Source;<br>

> +  /// Target (sink) node of the flow.<br>

> +  uint64_t Target;<br>

> +};<br>

> +<br>

> +/// Initializing flow network for a given function.<br>

> +///<br>

> +/// Every block is split into three nodes that are responsible for (i) an<br>

> +/// incoming flow, (ii) an outgoing flow, and (iii) penalizing an increase or<br>

> +/// reduction of the block weight.<br>

> +void initializeNetwork(MinCostMaxFlow &Network, FlowFunction &Func) {<br>

> +  uint64_t NumBlocks = Func.Blocks.size();<br>

> +  assert(NumBlocks > 1 && "Too few blocks in a function");<br>

> +  LLVM_DEBUG(dbgs() << "Initializing profi for " << NumBlocks << " blocks\n");<br>

> +<br>

> +  // Pre-process data: make sure the entry weight is at least 1<br>

> +  if (Func.Blocks[Func.Entry].Weight == 0) {<br>

> +    Func.Blocks[Func.Entry].Weight = 1;<br>

> +  }<br>

> +  // Introducing dummy source/sink pairs to allow flow circulation.<br>

> +  // The nodes corresponding to blocks of Func have indicies in the range<br>

> +  // [0..3 * NumBlocks); the dummy nodes are indexed by the next four values.<br>

> +  uint64_t S = 3 * NumBlocks;<br>

> +  uint64_t T = S + 1;<br>

> +  uint64_t S1 = S + 2;<br>

> +  uint64_t T1 = S + 3;<br>

> +<br>

> +  Network.initialize(3 * NumBlocks + 4, S1, T1);<br>

> +<br>

> +  // Create three nodes for every block of the function<br>

> +  for (uint64_t B = 0; B < NumBlocks; B++) {<br>

> +    auto &Block = Func.Blocks[B];<br>

> +    assert((!Block.UnknownWeight || Block.Weight == 0 || Block.isEntry()) &&<br>

> +           "non-zero weight of a block w/o weight except for an entry");<br>

> +<br>

> +    // Split every block into two nodes<br>

> +    uint64_t Bin = 3 * B;<br>

> +    uint64_t Bout = 3 * B + 1;<br>

> +    uint64_t Baux = 3 * B + 2;<br>

> +    if (Block.Weight > 0) {<br>

> +      Network.addEdge(S1, Bout, Block.Weight, 0);<br>

> +      Network.addEdge(Bin, T1, Block.Weight, 0);<br>

> +    }<br>

> +<br>

> +    // Edges from S and to T<br>

> +    assert((!Block.isEntry() || !Block.isExit()) &&<br>

> +           "a block cannot be an entry and an exit");<br>

> +    if (Block.isEntry()) {<br>

> +      Network.addEdge(S, Bin, 0);<br>

> +    } else if (Block.isExit()) {<br>

> +      Network.addEdge(Bout, T, 0);<br>

> +    }<br>

> +<br>

> +    // An auxiliary node to allow increase/reduction of block counts:<br>

> +    // We assume that decreasing block counts is more expensive than increasing,<br>

> +    // and thus, setting separate costs here. In the future we may want to tune<br>

> +    // the relative costs so as to maximize the quality of generated profiles.<br>

> +    int64_t AuxCostInc = MinCostMaxFlow::AuxCostInc;<br>

> +    int64_t AuxCostDec = MinCostMaxFlow::AuxCostDec;<br>

> +    if (Block.UnknownWeight) {<br>

> +      // Do not penalize changing weights of blocks w/o known profile count<br>

> +      AuxCostInc = 0;<br>

> +      AuxCostDec = 0;<br>

> +    } else {<br>

> +      // Increasing the count for "cold" blocks with zero initial count is more<br>

> +      // expensive than for "hot" ones<br>

> +      if (Block.Weight == 0) {<br>

> +        AuxCostInc = MinCostMaxFlow::AuxCostIncZero;<br>

> +      }<br>

> +      // Modifying the count of the entry block is expensive<br>

> +      if (Block.isEntry()) {<br>

> +        AuxCostInc = MinCostMaxFlow::AuxCostIncEntry;<br>

> +        AuxCostDec = MinCostMaxFlow::AuxCostDecEntry;<br>

> +      }<br>

> +    }<br>

> +    // For blocks with self-edges, do not penalize a reduction of the count,<br>

> +    // as all of the increase can be attributed to the self-edge<br>

> +    if (Block.HasSelfEdge) {<br>

> +      AuxCostDec = 0;<br>

> +    }<br>

> +<br>

> +    Network.addEdge(Bin, Baux, AuxCostInc);<br>

> +    Network.addEdge(Baux, Bout, AuxCostInc);<br>

> +    if (Block.Weight > 0) {<br>

> +      Network.addEdge(Bout, Baux, AuxCostDec);<br>

> +      Network.addEdge(Baux, Bin, AuxCostDec);<br>

> +    }<br>

> +  }<br>

> +<br>

> +  // Creating edges for every jump<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    uint64_t Src = Jump.Source;<br>

> +    uint64_t Dst = Jump.Target;<br>

> +    if (Src != Dst) {<br>

> +      uint64_t SrcOut = 3 * Src + 1;<br>

> +      uint64_t DstIn = 3 * Dst;<br>

> +      uint64_t Cost = Jump.IsUnlikely ? MinCostMaxFlow::AuxCostUnlikely : 0;<br>

> +      Network.addEdge(SrcOut, DstIn, Cost);<br>

> +    }<br>

> +  }<br>

> +<br>

> +  // Make sure we have a valid flow circulation<br>

> +  Network.addEdge(T, S, 0);<br>

> +}<br>

> +<br>

> +/// Extract resulting block and edge counts from the flow network.<br>

> +void extractWeights(MinCostMaxFlow &Network, FlowFunction &Func) {<br>

> +  uint64_t NumBlocks = Func.Blocks.size();<br>

> +<br>

> +  // Extract resulting block counts<br>

> +  for (uint64_t Src = 0; Src < NumBlocks; Src++) {<br>

> +    auto &Block = Func.Blocks[Src];<br>

> +    uint64_t SrcOut = 3 * Src + 1;<br>

> +    int64_t Flow = 0;<br>

> +    for (auto &Adj : Network.getFlow(SrcOut)) {<br>

> +      uint64_t DstIn = Adj.first;<br>

> +      int64_t DstFlow = Adj.second;<br>

> +      bool IsAuxNode = (DstIn < 3 * NumBlocks && DstIn % 3 == 2);<br>

> +      if (!IsAuxNode || Block.HasSelfEdge) {<br>

> +        Flow += DstFlow;<br>

> +      }<br>

> +    }<br>

> +    Block.Flow = Flow;<br>

> +    assert(Flow >= 0 && "negative block flow");<br>

> +  }<br>

> +<br>

> +  // Extract resulting jump counts<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    uint64_t Src = Jump.Source;<br>

> +    uint64_t Dst = Jump.Target;<br>

> +    int64_t Flow = 0;<br>

> +    if (Src != Dst) {<br>

> +      uint64_t SrcOut = 3 * Src + 1;<br>

> +      uint64_t DstIn = 3 * Dst;<br>

> +      Flow = Network.getFlow(SrcOut, DstIn);<br>

> +    } else {<br>

> +      uint64_t SrcOut = 3 * Src + 1;<br>

> +      uint64_t SrcAux = 3 * Src + 2;<br>

> +      int64_t AuxFlow = Network.getFlow(SrcOut, SrcAux);<br>

> +      if (AuxFlow > 0)<br>

> +        Flow = AuxFlow;<br>

> +    }<br>

> +    Jump.Flow = Flow;<br>

> +    assert(Flow >= 0 && "negative jump flow");<br>

> +  }<br>

> +}<br>

> +<br>

> +#ifndef NDEBUG<br>

> +/// Verify that the computed flow values satisfy flow conservation rules<br>

> +void verifyWeights(const FlowFunction &Func) {<br>

> +  const uint64_t NumBlocks = Func.Blocks.size();<br>

> +  auto InFlow = std::vector<uint64_t>(NumBlocks, 0);<br>

> +  auto OutFlow = std::vector<uint64_t>(NumBlocks, 0);<br>

> +  for (auto &Jump : Func.Jumps) {<br>

> +    InFlow[Jump.Target] += Jump.Flow;<br>

> +    OutFlow[Jump.Source] += Jump.Flow;<br>

> +  }<br>

> +<br>

> +  uint64_t TotalInFlow = 0;<br>

> +  uint64_t TotalOutFlow = 0;<br>

> +  for (uint64_t I = 0; I < NumBlocks; I++) {<br>

> +    auto &Block = Func.Blocks[I];<br>

> +    if (Block.isEntry()) {<br>

> +      TotalInFlow += Block.Flow;<br>

> +      assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");<br>

> +    } else if (Block.isExit()) {<br>

> +      TotalOutFlow += Block.Flow;<br>

> +      assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");<br>

> +    } else {<br>

> +      assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");<br>

> +      assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");<br>

> +    }<br>

> +  }<br>

> +  assert(TotalInFlow == TotalOutFlow && "incorrectly computed control flow");<br>

> +}<br>

> +#endif<br>

> +<br>

> +} // end of anonymous namespace<br>

> +<br>

> +/// Apply the profile inference algorithm for a given flow function<br>

> +void llvm::applyFlowInference(FlowFunction &Func) {<br>

> +  // Create and apply an inference network model<br>

> +  auto InferenceNetwork = MinCostMaxFlow();<br>

> +  initializeNetwork(InferenceNetwork, Func);<br>

> +  InferenceNetwork.run();<br>

> +<br>

> +  // Extract flow values for every block and every edge<br>

> +  extractWeights(InferenceNetwork, Func);<br>

> +<br>

> +#ifndef NDEBUG<br>

> +  // Verify the result<br>

> +  verifyWeights(Func);<br>

> +#endif<br>

> +}<br>

><br>

> diff  --git a/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp b/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>

> index 6d995cf4c0481..ea0e8343eb887 100644<br>

> --- a/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>

> +++ b/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>

> @@ -34,6 +34,10 @@ cl::opt<bool> NoWarnSampleUnused(<br>

>       cl::desc("Use this option to turn off/on warnings about function with "<br>

>                "samples but without debug information to use those samples. "));<br>

>   <br>

> +cl::opt<bool> SampleProfileUseProfi(<br>

> +    "sample-profile-use-profi", cl::init(false), cl::Hidden, cl::ZeroOrMore,<br>

> +    cl::desc("Use profi to infer block and edge counts."));<br>

> +<br>

>   namespace sampleprofutil {<br>

>   <br>

>   /// Return true if the given callsite is hot wrt to hot cutoff threshold.<br>

><br>

> diff  --git a/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof b/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>

> new file mode 100644<br>

> index 0000000000000..e995a04c7fd44<br>

> --- /dev/null<br>

> +++ b/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>

> @@ -0,0 +1,23 @@<br>

> +test_1:23968:0<br>

> + 1: 100<br>

> + 2: 60<br>

> + 3: 40<br>

> + !CFGChecksum: 4294967295<br>

> +<br>

> +test_2:23968:0<br>

> + 1: 100<br>

> + 3: 10<br>

> + !CFGChecksum: 37753817093<br>

> +<br>

> +test_3:10000:0<br>

> + 3: 13<br>

> + 5: 89<br>

> + !CFGChecksum: 69502983527<br>

> +<br>

> +sum_of_squares:23968:0<br>

> + 2: 5993<br>

> + 3: 1<br>

> + 4: 5992<br>

> + 5: 5992<br>

> + 8: 5992<br>

> + !CFGChecksum: 175862120757<br>

><br>

> diff  --git a/llvm/test/Transforms/SampleProfile/profile-inference.ll b/llvm/test/Transforms/SampleProfile/profile-inference.ll<br>

> new file mode 100644<br>

> index 0000000000000..7f40358e65268<br>

> --- /dev/null<br>

> +++ b/llvm/test/Transforms/SampleProfile/profile-inference.ll<br>

> @@ -0,0 +1,245 @@<br>

> +; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof | opt -analyze -branch-prob -enable-new-pm=0 | FileCheck %s<br>

> +; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof | opt -analyze -block-freq  -enable-new-pm=0 | FileCheck %s --check-prefix=CHECK2<br>

> +<br>

> +; The test verifies that profile inference correctly builds branch probabilities<br>

> +; from sampling-based block counts.<br>

> +;<br>

> +; +---------+     +----------+<br>

> +; | b3 [40] | <-- | b1 [100] |<br>

> +; +---------+     +----------+<br>

> +;                   |<br>

> +;                   |<br>

> +;                   v<br>

> +;                 +----------+<br>

> +;                 | b2 [60]  |<br>

> +;                 +----------+<br>

> +<br>

> +@yydebug = dso_local global i32 0, align 4<br>

> +<br>

> +; Function Attrs: nounwind uwtable<br>

> +define dso_local i32 @test_1() #0 {<br>

> +b1:<br>

> +  call void @llvm.pseudoprobe(i64 7964825052912775246, i64 1, i32 0, i64 -1)<br>

> +  %0 = load i32, i32* @yydebug, align 4<br>

> +  %cmp = icmp ne i32 %0, 0<br>

> +  br i1 %cmp, label %b2, label %b3<br>

> +; CHECK:  edge b1 -> b2 probability is 0x4ccccccd / 0x80000000 = 60.00%<br>

> +; CHECK:  edge b1 -> b3 probability is 0x33333333 / 0x80000000 = 40.00%<br>

> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 100<br>

> +<br>

> +b2:<br>

> +  call void @llvm.pseudoprobe(i64 7964825052912775246, i64 2, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 60<br>

> +<br>

> +b3:<br>

> +  call void @llvm.pseudoprobe(i64 7964825052912775246, i64 3, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 40<br>

> +}<br>

> +<br>

> +<br>

> +; The test verifies that profile inference correctly builds branch probabilities<br>

> +; from sampling-based block counts in the presence of "dangling" probes (whose<br>

> +; block counts are missing).<br>

> +;<br>

> +; +---------+     +----------+<br>

> +; | b3 [10] | <-- | b1 [100] |<br>

> +; +---------+     +----------+<br>

> +;                   |<br>

> +;                   |<br>

> +;                   v<br>

> +;                 +----------+<br>

> +;                 | b2 [?]  |<br>

> +;                 +----------+<br>

> +<br>

> +; Function Attrs: nounwind uwtable<br>

> +define dso_local i32 @test_2() #0 {<br>

> +b1:<br>

> +  call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 1, i32 0, i64 -1)<br>

> +  %0 = load i32, i32* @yydebug, align 4<br>

> +  %cmp = icmp ne i32 %0, 0<br>

> +  br i1 %cmp, label %b2, label %b3<br>

> +; CHECK:  edge b1 -> b2 probability is 0x73333333 / 0x80000000 = 90.00%<br>

> +; CHECK:  edge b1 -> b3 probability is 0x0ccccccd / 0x80000000 = 10.00%<br>

> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 100<br>

> +<br>

> +b2:<br>

> +  call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 2, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 90<br>

> +<br>

> +b3:<br>

> +  call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 3, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +}<br>

> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 10<br>

> +<br>

> +<br>

> +; The test verifies that profi is able to infer block counts from hot subgraphs.<br>

> +;<br>

> +; +---------+     +---------+<br>

> +; | b4 [?]  | <-- | b1 [?]  |<br>

> +; +---------+     +---------+<br>

> +;   |               |<br>

> +;   |               |<br>

> +;   v               v<br>

> +; +---------+     +---------+<br>

> +; | b5 [89] |     | b2 [?]  |<br>

> +; +---------+     +---------+<br>

> +;                   |<br>

> +;                   |<br>

> +;                   v<br>

> +;                 +---------+<br>

> +;                 | b3 [13] |<br>

> +;                 +---------+<br>

> +<br>

> +; Function Attrs: nounwind uwtable<br>

> +define dso_local i32 @test_3() #0 {<br>

> +b1:<br>

> +  call void @llvm.pseudoprobe(i64 1649282507922421973, i64 1, i32 0, i64 -1)<br>

> +  %0 = load i32, i32* @yydebug, align 4<br>

> +  %cmp = icmp ne i32 %0, 0<br>

> +  br i1 %cmp, label %b2, label %b4<br>

> +; CHECK:  edge b1 -> b2 probability is 0x10505050 / 0x80000000 = 12.75%<br>

> +; CHECK:  edge b1 -> b4 probability is 0x6fafafb0 / 0x80000000 = 87.25%<br>

> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 102<br>

> +<br>

> +b2:<br>

> +  call void @llvm.pseudoprobe(i64 1649282507922421973, i64 2, i32 0, i64 -1)<br>

> +  br label %b3<br>

> +; CHECK:  edge b2 -> b3 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 13<br>

> +<br>

> +b3:<br>

> +  call void @llvm.pseudoprobe(i64 1649282507922421973, i64 3, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 13<br>

> +<br>

> +b4:<br>

> +  call void @llvm.pseudoprobe(i64 1649282507922421973, i64 4, i32 0, i64 -1)<br>

> +  br label %b5<br>

> +; CHECK:  edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b4: float = {{.*}}, int = {{.*}}, count = 89<br>

> +<br>

> +b5:<br>

> +  call void @llvm.pseudoprobe(i64 1649282507922421973, i64 5, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +; CHECK2: - b5: float = {{.*}}, int = {{.*}}, count = 89<br>

> +}<br>

> +<br>

> +<br>

> +; A larger test to verify that profile inference correctly identifies hot parts<br>

> +; of the control-flow graph.<br>

> +;<br>

> +;                +-----------+<br>

> +;                |  b1 [?]   |<br>

> +;                +-----------+<br>

> +;                  |<br>

> +;                  |<br>

> +;                  v<br>

> +; +--------+     +-----------+<br>

> +; | b3 [1] | <-- | b2 [5993] |<br>

> +; +--------+     +-----------+<br>

> +;   |              |<br>

> +;   |              |<br>

> +;   |              v<br>

> +;   |            +-----------+     +--------+<br>

> +;   |            | b4 [5992] | --> | b6 [?] |<br>

> +;   |            +-----------+     +--------+<br>

> +;   |              |                 |<br>

> +;   |              |                 |<br>

> +;   |              v                 |<br>

> +;   |            +-----------+       |<br>

> +;   |            | b5 [5992] |       |<br>

> +;   |            +-----------+       |<br>

> +;   |              |                 |<br>

> +;   |              |                 |<br>

> +;   |              v                 |<br>

> +;   |            +-----------+       |<br>

> +;   |            |  b7 [?]   |       |<br>

> +;   |            +-----------+       |<br>

> +;   |              |                 |<br>

> +;   |              |                 |<br>

> +;   |              v                 |<br>

> +;   |            +-----------+       |<br>

> +;   |            | b8 [5992] | <-----+<br>

> +;   |            +-----------+<br>

> +;   |              |<br>

> +;   |              |<br>

> +;   |              v<br>

> +;   |            +-----------+<br>

> +;   +----------> |  b9 [?]   |<br>

> +;                +-----------+<br>

> +<br>

> +; Function Attrs: nounwind uwtable<br>

> +define dso_local i32 @sum_of_squares() #0 {<br>

> +b1:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 1, i32 0, i64 -1)<br>

> +  %0 = load i32, i32* @yydebug, align 4<br>

> +  %cmp = icmp ne i32 %0, 0<br>

> +  br label %b2<br>

> +; CHECK:  edge b1 -> b2 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 5993<br>

> +<br>

> +b2:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 2, i32 0, i64 -1)<br>

> +  br i1 %cmp, label %b4, label %b3<br>

> +; CHECK:  edge b2 -> b4 probability is 0x7ffa8844 / 0x80000000 = 99.98%<br>

> +; CHECK:  edge b2 -> b3 probability is 0x000577bc / 0x80000000 = 0.02%<br>

> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 5993<br>

> +<br>

> +b3:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 3, i32 0, i64 -1)<br>

> +  br label %b9<br>

> +; CHECK:  edge b3 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 1<br>

> +<br>

> +b4:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 4, i32 0, i64 -1)<br>

> +  br i1 %cmp, label %b5, label %b6<br>

> +; CHECK:  edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK:  edge b4 -> b6 probability is 0x00000000 / 0x80000000 = 0.00%<br>

> +; CHECK2: - b4: float = {{.*}}, int = {{.*}}, count = 5992<br>

> +<br>

> +b5:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 5, i32 0, i64 -1)<br>

> +  br label %b7<br>

> +; CHECK:  edge b5 -> b7 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b5: float = {{.*}}, int = {{.*}}, count = 5992<br>

> +<br>

> +b6:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 6, i32 0, i64 -1)<br>

> +  br label %b8<br>

> +; CHECK:  edge b6 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b6: float = {{.*}}, int = {{.*}}, count = 0<br>

> +<br>

> +b7:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 7, i32 0, i64 -1)<br>

> +  br label %b8<br>

> +; CHECK:  edge b7 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b7: float = {{.*}}, int = {{.*}}, count = 5992<br>

> +<br>

> +b8:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 8, i32 0, i64 -1)<br>

> +  br label %b9<br>

> +; CHECK:  edge b8 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%<br>

> +; CHECK2: - b8: float = {{.*}}, int = {{.*}}, count = 5992<br>

> +<br>

> +b9:<br>

> +  call void @llvm.pseudoprobe(i64 -907520326213521421, i64 9, i32 0, i64 -1)<br>

> +  ret i32 %0<br>

> +}<br>

> +; CHECK2: - b9: float = {{.*}}, int = {{.*}}, count = 5993<br>

> +<br>

> +declare void @llvm.pseudoprobe(i64, i64, i32, i64) #1<br>

> +<br>

> +attributes #0 = { noinline nounwind uwtable "use-sample-profile"}<br>

> +attributes #1 = { nounwind }<br>

> +<br>

> +!llvm.pseudo_probe_desc = !{!6, !7, !8, !9}<br>

> +<br>

> +!6 = !{i64 7964825052912775246, i64 4294967295, !"test_1", null}<br>

> +!7 = !{i64 -6216829535442445639, i64 37753817093, !"test_2", null}<br>

> +!8 = !{i64 1649282507922421973, i64 69502983527, !"test_3", null}<br>

> +!9 = !{i64 -907520326213521421, i64 175862120757, !"sum_of_squares", null}<br>

><br>

><br>

>          <br>

> _______________________________________________<br>

> llvm-commits mailing list<br>

> llvm-commits@lists.llvm.org<br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>

<o:p></o:p></p>

</div>

</div>

</body>

</html>