<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi Philip,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for the suggestion. The resubmission came with a fix for the first break as I commented here: https://reviews.llvm.org/rGb00fc198224e . Sorry for not updating the commit message. I will do that next time.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The second break was due to a different issue. It’s being addressed here:
<a href="https://reviews.llvm.org/D109860">https://reviews.llvm.org/D109860</a>. Once the fix is ready, I’ll update the commit message with both fixes before resubmission.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Hongtao<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">
<b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Philip Reames <listmail@philipreames.com><br>
<b>Date: </b>Tuesday, November 23, 2021 at 12:22 PM<br>
<b>To: </b>Hongtao Yu <hoy@fb.com>, Hongtao Yu <llvmlistbot@llvm.org>, llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org><br>
<b>Subject: </b>Re: [llvm] 884b6dd - profi - a flow-based profile inference algorithm: Part I (out of 3)<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">This appears to have been resubmitted after a revert for breaking the
<br>
build, while still breaking the build and without any discussion of <br>
fixes between the two versions. Please do not do this!<br>
<br>
If you think you have fixed an issue causing a revert, you *must* <br>
describe in the submission comment what the problem was and how you <br>
fixed it.<br>
<br>
Philip<br>
<br>
On 11/23/21 11:04 AM, Hongtao Yu via llvm-commits wrote:<br>
> Author: spupyrev<br>
> Date: 2021-11-23T11:02:40-08:00<br>
> New Revision: 884b6dd311422bbfac62b8a90fbfff8e77ba8121<br>
><br>
> URL: <a href="https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121">https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121</a><br>
> DIFF: <a href="https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121.diff">https://github.com/llvm/llvm-project/commit/884b6dd311422bbfac62b8a90fbfff8e77ba8121.diff</a><br>
><br>
> LOG: profi - a flow-based profile inference algorithm: Part I (out of 3)<br>
><br>
> The benefits of sampling-based PGO crucially depends on the quality of profile<br>
> data. This diff implements a flow-based algorithm, called profi, that helps to<br>
> overcome the inaccuracies in a profile after it is collected.<br>
><br>
> Profi is an extended and significantly re-engineered classic MCMF (min-cost<br>
> max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing<br>
> missing and inaccurate profiling using a minimum cost circulation algorithm]. It<br>
> models profile inference as an optimization problem on a control-flow graph with<br>
> the objectives and constraints capturing the desired properties of profile data.<br>
> Three important challenges that are being solved by profi:<br>
> - "fixing" errors in profiles caused by sampling;<br>
> - converting basic block counts to edge frequencies (branch probabilities);<br>
> - dealing with "dangling" blocks having no samples in the profile.<br>
><br>
> The main implementation (and required docs) are in SampleProfileInference.cpp.<br>
> The worst-time complexity is quadratic in the number of blocks in a function,<br>
> O(|V|^2). However a careful engineering and extensive evaluation shows that<br>
> the running time is (slightly) super-linear. In particular, instances with<br>
> 1000 blocks are solved within 0.1 second.<br>
><br>
> The algorithm has been extensively tested internally on prod workloads,<br>
> significantly improving the quality of generated profile data and providing<br>
> speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it<br>
> generally improves the performance (with a few outliers) but extra work in<br>
> the compiler might be needed to re-tune existing optimization passes relying on<br>
> profile counts.<br>
><br>
> Reviewed By: wenlei, hoy<br>
><br>
> Differential Revision: <a href="https://reviews.llvm.org/D109860">https://reviews.llvm.org/D109860</a>
<br>
><br>
> Added:<br>
> llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>
> llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>
> llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>
> llvm/test/Transforms/SampleProfile/profile-inference.ll<br>
><br>
> Modified:<br>
> llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>
> llvm/lib/Transforms/IPO/SampleProfile.cpp<br>
> llvm/lib/Transforms/Utils/CMakeLists.txt<br>
> llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>
><br>
> Removed:<br>
> <br>
><br>
><br>
> ################################################################################<br>
> diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>
> new file mode 100644<br>
> index 0000000000000..e1f681bbd3677<br>
> --- /dev/null<br>
> +++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h<br>
> @@ -0,0 +1,284 @@<br>
> +//===- Transforms/Utils/SampleProfileInference.h ----------*- C++ -*-===//<br>
> +//<br>
> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>
> +// See <a href="https://llvm.org/LICENSE.txt">https://llvm.org/LICENSE.txt</a> for license information.<br>
> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>
> +//<br>
> +//===----------------------------------------------------------------------===//<br>
> +//<br>
> +/// \file<br>
> +/// This file provides the interface for the profile inference algorithm, profi.<br>
> +//<br>
> +//===----------------------------------------------------------------------===//<br>
> +<br>
> +#ifndef LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>
> +#define LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>
> +<br>
> +#include "llvm/ADT/DenseMap.h"<br>
> +#include "llvm/ADT/DepthFirstIterator.h"<br>
> +#include "llvm/ADT/SmallVector.h"<br>
> +<br>
> +#include "llvm/IR/BasicBlock.h"<br>
> +#include "llvm/IR/Instruction.h"<br>
> +#include "llvm/IR/Instructions.h"<br>
> +<br>
> +namespace llvm {<br>
> +<br>
> +class BasicBlock;<br>
> +class Function;<br>
> +class MachineBasicBlock;<br>
> +class MachineFunction;<br>
> +<br>
> +namespace afdo_detail {<br>
> +<br>
> +template <class BlockT> struct TypeMap {};<br>
> +template <> struct TypeMap<BasicBlock> {<br>
> + using BasicBlockT = BasicBlock;<br>
> + using FunctionT = Function;<br>
> +};<br>
> +template <> struct TypeMap<MachineBasicBlock> {<br>
> + using BasicBlockT = MachineBasicBlock;<br>
> + using FunctionT = MachineFunction;<br>
> +};<br>
> +<br>
> +} // end namespace afdo_detail<br>
> +<br>
> +struct FlowJump;<br>
> +<br>
> +/// A wrapper of a binary basic block.<br>
> +struct FlowBlock {<br>
> + uint64_t Index;<br>
> + uint64_t Weight{0};<br>
> + bool UnknownWeight{false};<br>
> + uint64_t Flow{0};<br>
> + bool HasSelfEdge{false};<br>
> + std::vector<FlowJump *> SuccJumps;<br>
> + std::vector<FlowJump *> PredJumps;<br>
> +<br>
> + /// Check if it is the entry block in the function.<br>
> + bool isEntry() const { return PredJumps.empty(); }<br>
> +<br>
> + /// Check if it is an exit block in the function.<br>
> + bool isExit() const { return SuccJumps.empty(); }<br>
> +};<br>
> +<br>
> +/// A wrapper of a jump between two basic blocks.<br>
> +struct FlowJump {<br>
> + uint64_t Source;<br>
> + uint64_t Target;<br>
> + uint64_t Flow{0};<br>
> + bool IsUnlikely{false};<br>
> +};<br>
> +<br>
> +/// A wrapper of binary function with basic blocks and jumps.<br>
> +struct FlowFunction {<br>
> + std::vector<FlowBlock> Blocks;<br>
> + std::vector<FlowJump> Jumps;<br>
> + /// The index of the entry block.<br>
> + uint64_t Entry;<br>
> +};<br>
> +<br>
> +void applyFlowInference(FlowFunction &Func);<br>
> +<br>
> +/// Sample profile inference pass.<br>
> +template <typename BT> class SampleProfileInference {<br>
> +public:<br>
> + using BasicBlockT = typename afdo_detail::TypeMap<BT>::BasicBlockT;<br>
> + using FunctionT = typename afdo_detail::TypeMap<BT>::FunctionT;<br>
> + using Edge = std::pair<const BasicBlockT *, const BasicBlockT *>;<br>
> + using BlockWeightMap = DenseMap<const BasicBlockT *, uint64_t>;<br>
> + using EdgeWeightMap = DenseMap<Edge, uint64_t>;<br>
> + using BlockEdgeMap =<br>
> + DenseMap<const BasicBlockT *, SmallVector<const BasicBlockT *, 8>>;<br>
> +<br>
> + SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,<br>
> + BlockWeightMap &SampleBlockWeights)<br>
> + : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights) {}<br>
> +<br>
> + /// Apply the profile inference algorithm for a given function<br>
> + void apply(BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);<br>
> +<br>
> +private:<br>
> + /// Try to infer branch probabilities mimicking implementation of<br>
> + /// BranchProbabilityInfo. Unlikely taken branches are marked so that the<br>
> + /// inference algorithm can avoid sending flow along corresponding edges.<br>
> + void findUnlikelyJumps(const std::vector<const BasicBlockT *> &BasicBlocks,<br>
> + BlockEdgeMap &Successors, FlowFunction &Func);<br>
> +<br>
> + /// Determine whether the block is an exit in the CFG.<br>
> + bool isExit(const BasicBlockT *BB);<br>
> +<br>
> + /// Function.<br>
> + const FunctionT &F;<br>
> +<br>
> + /// Successors for each basic block in the CFG.<br>
> + BlockEdgeMap &Successors;<br>
> +<br>
> + /// Map basic blocks to their sampled weights.<br>
> + BlockWeightMap &SampleBlockWeights;<br>
> +};<br>
> +<br>
> +template <typename BT><br>
> +void SampleProfileInference<BT>::apply(BlockWeightMap &BlockWeights,<br>
> + EdgeWeightMap &EdgeWeights) {<br>
> + // Find all forwards reachable blocks which the inference algorithm will be<br>
> + // applied on.<br>
> + df_iterator_default_set<const BasicBlockT *> Reachable;<br>
> + for (auto *BB : depth_first_ext(&F, Reachable))<br>
> + (void)BB /* Mark all reachable blocks */;<br>
> +<br>
> + // Find all backwards reachable blocks which the inference algorithm will be<br>
> + // applied on.<br>
> + df_iterator_default_set<const BasicBlockT *> InverseReachable;<br>
> + for (const auto &BB : F) {<br>
> + // An exit block is a block without any successors.<br>
> + if (isExit(&BB)) {<br>
> + for (auto *RBB : inverse_depth_first_ext(&BB, InverseReachable))<br>
> + (void)RBB;<br>
> + }<br>
> + }<br>
> +<br>
> + // Keep a stable order for reachable blocks<br>
> + DenseMap<const BasicBlockT *, uint64_t> BlockIndex;<br>
> + std::vector<const BasicBlockT *> BasicBlocks;<br>
> + BlockIndex.reserve(Reachable.size());<br>
> + BasicBlocks.reserve(Reachable.size());<br>
> + for (const auto &BB : F) {<br>
> + if (Reachable.count(&BB) && InverseReachable.count(&BB)) {<br>
> + BlockIndex[&BB] = BasicBlocks.size();<br>
> + BasicBlocks.push_back(&BB);<br>
> + }<br>
> + }<br>
> +<br>
> + BlockWeights.clear();<br>
> + EdgeWeights.clear();<br>
> + bool HasSamples = false;<br>
> + for (const auto *BB : BasicBlocks) {<br>
> + auto It = SampleBlockWeights.find(BB);<br>
> + if (It != SampleBlockWeights.end() && It->second > 0) {<br>
> + HasSamples = true;<br>
> + BlockWeights[BB] = It->second;<br>
> + }<br>
> + }<br>
> + // Quit early for functions with a single block or ones w/o samples<br>
> + if (BasicBlocks.size() <= 1 || !HasSamples) {<br>
> + return;<br>
> + }<br>
> +<br>
> + // Create necessary objects<br>
> + FlowFunction Func;<br>
> + Func.Blocks.reserve(BasicBlocks.size());<br>
> + // Create FlowBlocks<br>
> + for (const auto *BB : BasicBlocks) {<br>
> + FlowBlock Block;<br>
> + if (SampleBlockWeights.find(BB) != SampleBlockWeights.end()) {<br>
> + Block.UnknownWeight = false;<br>
> + Block.Weight = SampleBlockWeights[BB];<br>
> + } else {<br>
> + Block.UnknownWeight = true;<br>
> + Block.Weight = 0;<br>
> + }<br>
> + Block.Index = Func.Blocks.size();<br>
> + Func.Blocks.push_back(Block);<br>
> + }<br>
> + // Create FlowEdges<br>
> + for (const auto *BB : BasicBlocks) {<br>
> + for (auto *Succ : Successors[BB]) {<br>
> + if (!BlockIndex.count(Succ))<br>
> + continue;<br>
> + FlowJump Jump;<br>
> + Jump.Source = BlockIndex[BB];<br>
> + Jump.Target = BlockIndex[Succ];<br>
> + Func.Jumps.push_back(Jump);<br>
> + if (BB == Succ) {<br>
> + Func.Blocks[BlockIndex[BB]].HasSelfEdge = true;<br>
> + }<br>
> + }<br>
> + }<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + Func.Blocks[Jump.Source].SuccJumps.push_back(&Jump);<br>
> + Func.Blocks[Jump.Target].PredJumps.push_back(&Jump);<br>
> + }<br>
> +<br>
> + // Try to infer probabilities of jumps based on the content of basic block<br>
> + findUnlikelyJumps(BasicBlocks, Successors, Func);<br>
> +<br>
> + // Find the entry block<br>
> + for (size_t I = 0; I < Func.Blocks.size(); I++) {<br>
> + if (Func.Blocks[I].isEntry()) {<br>
> + Func.Entry = I;<br>
> + break;<br>
> + }<br>
> + }<br>
> +<br>
> + // Create and apply the inference network model.<br>
> + applyFlowInference(Func);<br>
> +<br>
> + // Extract the resulting weights from the control flow<br>
> + // All weights are increased by one to avoid propagation errors introduced by<br>
> + // zero weights.<br>
> + for (const auto *BB : BasicBlocks) {<br>
> + BlockWeights[BB] = Func.Blocks[BlockIndex[BB]].Flow;<br>
> + }<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + Edge E = std::make_pair(BasicBlocks[Jump.Source], BasicBlocks[Jump.Target]);<br>
> + EdgeWeights[E] = Jump.Flow;<br>
> + }<br>
> +<br>
> +#ifndef NDEBUG<br>
> + // Unreachable blocks and edges should not have a weight.<br>
> + for (auto &I : BlockWeights) {<br>
> + assert(Reachable.contains(I.first));<br>
> + assert(InverseReachable.contains(I.first));<br>
> + }<br>
> + for (auto &I : EdgeWeights) {<br>
> + assert(Reachable.contains(I.first.first) &&<br>
> + Reachable.contains(I.first.second));<br>
> + assert(InverseReachable.contains(I.first.first) &&<br>
> + InverseReachable.contains(I.first.second));<br>
> + }<br>
> +#endif<br>
> +}<br>
> +<br>
> +template <typename BT><br>
> +inline void SampleProfileInference<BT>::findUnlikelyJumps(<br>
> + const std::vector<const BasicBlockT *> &BasicBlocks,<br>
> + BlockEdgeMap &Successors, FlowFunction &Func) {}<br>
> +<br>
> +template <><br>
> +inline void SampleProfileInference<BasicBlock>::findUnlikelyJumps(<br>
> + const std::vector<const BasicBlockT *> &BasicBlocks,<br>
> + BlockEdgeMap &Successors, FlowFunction &Func) {<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + const auto *BB = BasicBlocks[Jump.Source];<br>
> + const auto *Succ = BasicBlocks[Jump.Target];<br>
> + const Instruction *TI = BB->getTerminator();<br>
> + // Check if a block ends with InvokeInst and mark non-taken branch unlikely.<br>
> + // In that case block Succ should be a landing pad<br>
> + if (Successors[BB].size() == 2 && Successors[BB].back() == Succ) {<br>
> + if (isa<InvokeInst>(TI)) {<br>
> + Jump.IsUnlikely = true;<br>
> + }<br>
> + }<br>
> + const Instruction *SuccTI = Succ->getTerminator();<br>
> + // Check if the target block contains UnreachableInst and mark it unlikely<br>
> + if (SuccTI->getNumSuccessors() == 0) {<br>
> + if (isa<UnreachableInst>(SuccTI)) {<br>
> + Jump.IsUnlikely = true;<br>
> + }<br>
> + }<br>
> + }<br>
> +}<br>
> +<br>
> +template <typename BT><br>
> +inline bool SampleProfileInference<BT>::isExit(const BasicBlockT *BB) {<br>
> + return BB->succ_empty();<br>
> +}<br>
> +<br>
> +template <><br>
> +inline bool SampleProfileInference<BasicBlock>::isExit(const BasicBlock *BB) {<br>
> + return succ_empty(BB);<br>
> +}<br>
> +<br>
> +} // end namespace llvm<br>
> +#endif // LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H<br>
><br>
> diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h b/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>
> index 6a2f0acf46f32..e9b3d5aef15fb 100644<br>
> --- a/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>
> +++ b/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h<br>
> @@ -38,6 +38,7 @@<br>
> #include "llvm/Support/CommandLine.h"<br>
> #include "llvm/Support/GenericDomTree.h"<br>
> #include "llvm/Support/raw_ostream.h"<br>
> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>
> #include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"<br>
> <br>
> namespace llvm {<br>
> @@ -74,6 +75,8 @@ template <> struct IRTraits<BasicBlock> {<br>
> <br>
> } // end namespace afdo_detail<br>
> <br>
> +extern cl::opt<unsigned> SampleProfileUseProfi;<br>
> +<br>
> template <typename BT> class SampleProfileLoaderBaseImpl {<br>
> public:<br>
> SampleProfileLoaderBaseImpl(std::string Name, std::string RemapName)<br>
> @@ -142,6 +145,9 @@ template <typename BT> class SampleProfileLoaderBaseImpl {<br>
> ArrayRef<BasicBlockT *> Descendants,<br>
> PostDominatorTreeT *DomTree);<br>
> void propagateWeights(FunctionT &F);<br>
> + void applyProfi(FunctionT &F, BlockEdgeMap &Successors,<br>
> + BlockWeightMap &SampleBlockWeights,<br>
> + BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);<br>
> uint64_t visitEdge(Edge E, unsigned *NumUnknownEdges, Edge *UnknownEdge);<br>
> void buildEdges(FunctionT &F);<br>
> bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);<br>
> @@ -150,6 +156,11 @@ template <typename BT> class SampleProfileLoaderBaseImpl {<br>
> bool<br>
> computeAndPropagateWeights(FunctionT &F,<br>
> const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>
> + void initWeightPropagation(FunctionT &F,<br>
> + const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>
> + void<br>
> + finalizeWeightPropagation(FunctionT &F,<br>
> + const DenseSet<GlobalValue::GUID> &InlinedGUIDs);<br>
> void emitCoverageRemarks(FunctionT &F);<br>
> <br>
> /// Map basic blocks to their computed weights.<br>
> @@ -741,50 +752,65 @@ void SampleProfileLoaderBaseImpl<BT>::buildEdges(FunctionT &F) {<br>
> /// known).<br>
> template <typename BT><br>
> void SampleProfileLoaderBaseImpl<BT>::propagateWeights(FunctionT &F) {<br>
> - bool Changed = true;<br>
> - unsigned I = 0;<br>
> -<br>
> - // If BB weight is larger than its corresponding loop's header BB weight,<br>
> - // use the BB weight to replace the loop header BB weight.<br>
> - for (auto &BI : F) {<br>
> - BasicBlockT *BB = &BI;<br>
> - LoopT *L = LI->getLoopFor(BB);<br>
> - if (!L) {<br>
> - continue;<br>
> + // Flow-based profile inference is only usable with BasicBlock instantiation<br>
> + // of SampleProfileLoaderBaseImpl.<br>
> + if (SampleProfileUseProfi) {<br>
> + // Prepare block sample counts for inference.<br>
> + BlockWeightMap SampleBlockWeights;<br>
> + for (const auto &BI : F) {<br>
> + ErrorOr<uint64_t> Weight = getBlockWeight(&BI);<br>
> + if (Weight)<br>
> + SampleBlockWeights[&BI] = Weight.get();<br>
> }<br>
> - BasicBlockT *Header = L->getHeader();<br>
> - if (Header && BlockWeights[BB] > BlockWeights[Header]) {<br>
> - BlockWeights[Header] = BlockWeights[BB];<br>
> + // Fill in BlockWeights and EdgeWeights using an inference algorithm.<br>
> + applyProfi(F, Successors, SampleBlockWeights, BlockWeights, EdgeWeights);<br>
> + } else {<br>
> + bool Changed = true;<br>
> + unsigned I = 0;<br>
> +<br>
> + // If BB weight is larger than its corresponding loop's header BB weight,<br>
> + // use the BB weight to replace the loop header BB weight.<br>
> + for (auto &BI : F) {<br>
> + BasicBlockT *BB = &BI;<br>
> + LoopT *L = LI->getLoopFor(BB);<br>
> + if (!L) {<br>
> + continue;<br>
> + }<br>
> + BasicBlockT *Header = L->getHeader();<br>
> + if (Header && BlockWeights[BB] > BlockWeights[Header]) {<br>
> + BlockWeights[Header] = BlockWeights[BB];<br>
> + }<br>
> }<br>
> - }<br>
> <br>
> - // Before propagation starts, build, for each block, a list of<br>
> - // unique predecessors and successors. This is necessary to handle<br>
> - // identical edges in multiway branches. Since we visit all blocks and all<br>
> - // edges of the CFG, it is cleaner to build these lists once at the start<br>
> - // of the pass.<br>
> - buildEdges(F);<br>
> + // Propagate until we converge or we go past the iteration limit.<br>
> + while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> + Changed = propagateThroughEdges(F, false);<br>
> + }<br>
> <br>
> - // Propagate until we converge or we go past the iteration limit.<br>
> - while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> - Changed = propagateThroughEdges(F, false);<br>
> - }<br>
> + // The first propagation propagates BB counts from annotated BBs to unknown<br>
> + // BBs. The 2nd propagation pass resets edges weights, and use all BB<br>
> + // weights to propagate edge weights.<br>
> + VisitedEdges.clear();<br>
> + Changed = true;<br>
> + while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> + Changed = propagateThroughEdges(F, false);<br>
> + }<br>
> <br>
> - // The first propagation propagates BB counts from annotated BBs to unknown<br>
> - // BBs. The 2nd propagation pass resets edges weights, and use all BB weights<br>
> - // to propagate edge weights.<br>
> - VisitedEdges.clear();<br>
> - Changed = true;<br>
> - while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> - Changed = propagateThroughEdges(F, false);<br>
> + // The 3rd propagation pass allows adjust annotated BB weights that are<br>
> + // obviously wrong.<br>
> + Changed = true;<br>
> + while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> + Changed = propagateThroughEdges(F, true);<br>
> + }<br>
> }<br>
> +}<br>
> <br>
> - // The 3rd propagation pass allows adjust annotated BB weights that are<br>
> - // obviously wrong.<br>
> - Changed = true;<br>
> - while (Changed && I++ < SampleProfileMaxPropagateIterations) {<br>
> - Changed = propagateThroughEdges(F, true);<br>
> - }<br>
> +template <typename BT><br>
> +void SampleProfileLoaderBaseImpl<BT>::applyProfi(<br>
> + FunctionT &F, BlockEdgeMap &Successors, BlockWeightMap &SampleBlockWeights,<br>
> + BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights) {<br>
> + auto Infer = SampleProfileInference<BT>(F, Successors, SampleBlockWeights);<br>
> + Infer.apply(BlockWeights, EdgeWeights);<br>
> }<br>
> <br>
> /// Generate branch weight metadata for all branches in \p F.<br>
> @@ -842,26 +868,64 @@ bool SampleProfileLoaderBaseImpl<BT>::computeAndPropagateWeights(<br>
> Changed |= computeBlockWeights(F);<br>
> <br>
> if (Changed) {<br>
> - // Add an entry count to the function using the samples gathered at the<br>
> - // function entry.<br>
> - // Sets the GUIDs that are inlined in the profiled binary. This is used<br>
> - // for ThinLink to make correct liveness analysis, and also make the IR<br>
> - // match the profiled binary before annotation.<br>
> - getFunction(F).setEntryCount(<br>
> - ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),<br>
> - &InlinedGUIDs);<br>
> + // Initialize propagation.<br>
> + initWeightPropagation(F, InlinedGUIDs);<br>
> <br>
> + // Propagate weights to all edges.<br>
> + propagateWeights(F);<br>
> +<br>
> + // Post-process propagated weights.<br>
> + finalizeWeightPropagation(F, InlinedGUIDs);<br>
> + }<br>
> +<br>
> + return Changed;<br>
> +}<br>
> +<br>
> +template <typename BT><br>
> +void SampleProfileLoaderBaseImpl<BT>::initWeightPropagation(<br>
> + FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {<br>
> + // Add an entry count to the function using the samples gathered at the<br>
> + // function entry.<br>
> + // Sets the GUIDs that are inlined in the profiled binary. This is used<br>
> + // for ThinLink to make correct liveness analysis, and also make the IR<br>
> + // match the profiled binary before annotation.<br>
> + getFunction(F).setEntryCount(<br>
> + ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),<br>
> + &InlinedGUIDs);<br>
> +<br>
> + if (!SampleProfileUseProfi) {<br>
> // Compute dominance and loop info needed for propagation.<br>
> computeDominanceAndLoopInfo(F);<br>
> <br>
> // Find equivalence classes.<br>
> findEquivalenceClasses(F);<br>
> -<br>
> - // Propagate weights to all edges.<br>
> - propagateWeights(F);<br>
> }<br>
> <br>
> - return Changed;<br>
> + // Before propagation starts, build, for each block, a list of<br>
> + // unique predecessors and successors. This is necessary to handle<br>
> + // identical edges in multiway branches. Since we visit all blocks and all<br>
> + // edges of the CFG, it is cleaner to build these lists once at the start<br>
> + // of the pass.<br>
> + buildEdges(F);<br>
> +}<br>
> +<br>
> +template <typename BT><br>
> +void SampleProfileLoaderBaseImpl<BT>::finalizeWeightPropagation(<br>
> + FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {<br>
> + // If we utilize a flow-based count inference, then we trust the computed<br>
> + // counts and set the entry count as computed by the algorithm. This is<br>
> + // primarily done to sync the counts produced by profi and BFI inference,<br>
> + // which uses the entry count for mass propagation.<br>
> + // If profi produces a zero-value for the entry count, we fallback to<br>
> + // Samples->getHeadSamples() + 1 to avoid functions with zero count.<br>
> + if (SampleProfileUseProfi) {<br>
> + const BasicBlockT *EntryBB = getEntryBB(&F);<br>
> + if (BlockWeights[EntryBB] > 0) {<br>
> + getFunction(F).setEntryCount(<br>
> + ProfileCount(BlockWeights[EntryBB], Function::PCT_Real),<br>
> + &InlinedGUIDs);<br>
> + }<br>
> + }<br>
> }<br>
> <br>
> template <typename BT><br>
><br>
> diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>
> index a961c47a75013..3e01fd17f5260 100644<br>
> --- a/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>
> +++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp<br>
> @@ -84,6 +84,7 @@<br>
> #include "llvm/Transforms/Instrumentation.h"<br>
> #include "llvm/Transforms/Utils/CallPromotionUtils.h"<br>
> #include "llvm/Transforms/Utils/Cloning.h"<br>
> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>
> #include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"<br>
> #include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"<br>
> #include <algorithm><br>
> @@ -1648,6 +1649,19 @@ void SampleProfileLoader::generateMDProfMetadata(Function &F) {<br>
> SmallVector<uint32_t, 4> Weights;<br>
> uint32_t MaxWeight = 0;<br>
> Instruction *MaxDestInst;<br>
> + // Since profi treats multiple edges (multiway branches) as a single edge,<br>
> + // we need to distribute the computed weight among the branches. We do<br>
> + // this by evenly splitting the edge weight among destinations.<br>
> + DenseMap<const BasicBlock *, uint64_t> EdgeMultiplicity;<br>
> + std::vector<uint64_t> EdgeIndex;<br>
> + if (SampleProfileUseProfi) {<br>
> + EdgeIndex.resize(TI->getNumSuccessors());<br>
> + for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {<br>
> + const BasicBlock *Succ = TI->getSuccessor(I);<br>
> + EdgeIndex[I] = EdgeMultiplicity[Succ];<br>
> + EdgeMultiplicity[Succ]++;<br>
> + }<br>
> + }<br>
> for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {<br>
> BasicBlock *Succ = TI->getSuccessor(I);<br>
> Edge E = std::make_pair(BB, Succ);<br>
> @@ -1660,9 +1674,19 @@ void SampleProfileLoader::generateMDProfMetadata(Function &F) {<br>
> LLVM_DEBUG(dbgs() << " (saturated due to uint32_t overflow)");<br>
> Weight = std::numeric_limits<uint32_t>::max();<br>
> }<br>
> - // Weight is added by one to avoid propagation errors introduced by<br>
> - // 0 weights.<br>
> - Weights.push_back(static_cast<uint32_t>(Weight + 1));<br>
> + if (!SampleProfileUseProfi) {<br>
> + // Weight is added by one to avoid propagation errors introduced by<br>
> + // 0 weights.<br>
> + Weights.push_back(static_cast<uint32_t>(Weight + 1));<br>
> + } else {<br>
> + // Profi creates proper weights that do not require "+1" adjustments but<br>
> + // we evenly split the weight among branches with the same destination.<br>
> + uint64_t W = Weight / EdgeMultiplicity[Succ];<br>
> + // Rounding up, if needed, so that first branches are hotter.<br>
> + if (EdgeIndex[I] < Weight % EdgeMultiplicity[Succ])<br>
> + W++;<br>
> + Weights.push_back(static_cast<uint32_t>(W));<br>
> + }<br>
> if (Weight != 0) {<br>
> if (Weight > MaxWeight) {<br>
> MaxWeight = Weight;<br>
><br>
> diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt<br>
> index be4f7125eb853..22b9c0b19adab 100644<br>
> --- a/llvm/lib/Transforms/Utils/CMakeLists.txt<br>
> +++ b/llvm/lib/Transforms/Utils/CMakeLists.txt<br>
> @@ -60,6 +60,7 @@ add_llvm_component_library(LLVMTransformUtils<br>
> StripGCRelocates.cpp<br>
> SSAUpdater.cpp<br>
> SSAUpdaterBulk.cpp<br>
> + SampleProfileInference.cpp<br>
> SampleProfileLoaderBaseUtil.cpp<br>
> SanitizerStats.cpp<br>
> SimplifyCFG.cpp<br>
><br>
> diff --git a/llvm/lib/Transforms/Utils/SampleProfileInference.cpp b/llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>
> new file mode 100644<br>
> index 0000000000000..412a724006aa2<br>
> --- /dev/null<br>
> +++ b/llvm/lib/Transforms/Utils/SampleProfileInference.cpp<br>
> @@ -0,0 +1,461 @@<br>
> +//===- SampleProfileInference.cpp - Adjust sample profiles in the IR ------===//<br>
> +//<br>
> +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.<br>
> +// See <a href="https://llvm.org/LICENSE.txt">https://llvm.org/LICENSE.txt</a> for license information.<br>
> +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception<br>
> +//<br>
> +//===----------------------------------------------------------------------===//<br>
> +//<br>
> +// This file implements a profile inference algorithm. Given an incomplete and<br>
> +// possibly imprecise block counts, the algorithm reconstructs realistic block<br>
> +// and edge counts that satisfy flow conservation rules, while minimally modify<br>
> +// input block counts.<br>
> +//<br>
> +//===----------------------------------------------------------------------===//<br>
> +<br>
> +#include "llvm/Transforms/Utils/SampleProfileInference.h"<br>
> +#include "llvm/Support/Debug.h"<br>
> +#include <queue><br>
> +#include <set><br>
> +<br>
> +using namespace llvm;<br>
> +#define DEBUG_TYPE "sample-profile-inference"<br>
> +<br>
> +namespace {<br>
> +<br>
> +/// A value indicating an infinite flow/capacity/weight of a block/edge.<br>
> +/// Not using numeric_limits<int64_t>::max(), as the values can be summed up<br>
> +/// during the execution.<br>
> +static constexpr int64_t INF = ((int64_t)1) << 50;<br>
> +<br>
> +/// The minimum-cost maximum flow algorithm.<br>
> +///<br>
> +/// The algorithm finds the maximum flow of minimum cost on a given (directed)<br>
> +/// network using a modified version of the classical Moore-Bellman-Ford<br>
> +/// approach. The algorithm applies a number of augmentation iterations in which<br>
> +/// flow is sent along paths of positive capacity from the source to the sink.<br>
> +/// The worst-case time complexity of the implementation is O(v(f)*m*n), where<br>
> +/// where m is the number of edges, n is the number of vertices, and v(f) is the<br>
> +/// value of the maximum flow. However, the observed running time on typical<br>
> +/// instances is sub-quadratic, that is, o(n^2).<br>
> +///<br>
> +/// The input is a set of edges with specified costs and capacities, and a pair<br>
> +/// of nodes (source and sink). The output is the flow along each edge of the<br>
> +/// minimum total cost respecting the given edge capacities.<br>
> +class MinCostMaxFlow {<br>
> +public:<br>
> + // Initialize algorithm's data structures for a network of a given size.<br>
> + void initialize(uint64_t NodeCount, uint64_t SourceNode, uint64_t SinkNode) {<br>
> + Source = SourceNode;<br>
> + Target = SinkNode;<br>
> +<br>
> + Nodes = std::vector<Node>(NodeCount);<br>
> + Edges = std::vector<std::vector<Edge>>(NodeCount, std::vector<Edge>());<br>
> + }<br>
> +<br>
> + // Run the algorithm.<br>
> + int64_t run() {<br>
> + // Find an augmenting path and update the flow along the path<br>
> + size_t AugmentationIters = 0;<br>
> + while (findAugmentingPath()) {<br>
> + augmentFlowAlongPath();<br>
> + AugmentationIters++;<br>
> + }<br>
> +<br>
> + // Compute the total flow and its cost<br>
> + int64_t TotalCost = 0;<br>
> + int64_t TotalFlow = 0;<br>
> + for (uint64_t Src = 0; Src < Nodes.size(); Src++) {<br>
> + for (auto &Edge : Edges[Src]) {<br>
> + if (Edge.Flow > 0) {<br>
> + TotalCost += Edge.Cost * Edge.Flow;<br>
> + if (Src == Source)<br>
> + TotalFlow += Edge.Flow;<br>
> + }<br>
> + }<br>
> + }<br>
> + LLVM_DEBUG(dbgs() << "Completed profi after " << AugmentationIters<br>
> + << " iterations with " << TotalFlow << " total flow"<br>
> + << " of " << TotalCost << " cost\n");<br>
> + return TotalCost;<br>
> + }<br>
> +<br>
> + /// Adding an edge to the network with a specified capacity and a cost.<br>
> + /// Multiple edges between a pair of nodes are allowed but self-edges<br>
> + /// are not supported.<br>
> + void addEdge(uint64_t Src, uint64_t Dst, int64_t Capacity, int64_t Cost) {<br>
> + assert(Capacity > 0 && "adding an edge of zero capacity");<br>
> + assert(Src != Dst && "loop edge are not supported");<br>
> +<br>
> + Edge SrcEdge;<br>
> + SrcEdge.Dst = Dst;<br>
> + SrcEdge.Cost = Cost;<br>
> + SrcEdge.Capacity = Capacity;<br>
> + SrcEdge.Flow = 0;<br>
> + SrcEdge.RevEdgeIndex = Edges[Dst].size();<br>
> +<br>
> + Edge DstEdge;<br>
> + DstEdge.Dst = Src;<br>
> + DstEdge.Cost = -Cost;<br>
> + DstEdge.Capacity = 0;<br>
> + DstEdge.Flow = 0;<br>
> + DstEdge.RevEdgeIndex = Edges[Src].size();<br>
> +<br>
> + Edges[Src].push_back(SrcEdge);<br>
> + Edges[Dst].push_back(DstEdge);<br>
> + }<br>
> +<br>
> + /// Adding an edge to the network of infinite capacity and a given cost.<br>
> + void addEdge(uint64_t Src, uint64_t Dst, int64_t Cost) {<br>
> + addEdge(Src, Dst, INF, Cost);<br>
> + }<br>
> +<br>
> + /// Get the total flow from a given source node.<br>
> + /// Returns a list of pairs (target node, amount of flow to the target).<br>
> + const std::vector<std::pair<uint64_t, int64_t>> getFlow(uint64_t Src) const {<br>
> + std::vector<std::pair<uint64_t, int64_t>> Flow;<br>
> + for (auto &Edge : Edges[Src]) {<br>
> + if (Edge.Flow > 0)<br>
> + Flow.push_back(std::make_pair(Edge.Dst, Edge.Flow));<br>
> + }<br>
> + return Flow;<br>
> + }<br>
> +<br>
> + /// Get the total flow between a pair of nodes.<br>
> + int64_t getFlow(uint64_t Src, uint64_t Dst) const {<br>
> + int64_t Flow = 0;<br>
> + for (auto &Edge : Edges[Src]) {<br>
> + if (Edge.Dst == Dst) {<br>
> + Flow += Edge.Flow;<br>
> + }<br>
> + }<br>
> + return Flow;<br>
> + }<br>
> +<br>
> + /// A cost of increasing a block's count by one.<br>
> + static constexpr int64_t AuxCostInc = 10;<br>
> + /// A cost of decreasing a block's count by one.<br>
> + static constexpr int64_t AuxCostDec = 20;<br>
> + /// A cost of increasing a count of zero-weight block by one.<br>
> + static constexpr int64_t AuxCostIncZero = 11;<br>
> + /// A cost of increasing the entry block's count by one.<br>
> + static constexpr int64_t AuxCostIncEntry = 40;<br>
> + /// A cost of decreasing the entry block's count by one.<br>
> + static constexpr int64_t AuxCostDecEntry = 10;<br>
> + /// A cost of taking an unlikely jump.<br>
> + static constexpr int64_t AuxCostUnlikely = ((int64_t)1) << 20;<br>
> +<br>
> +private:<br>
> + /// Check for existence of an augmenting path with a positive capacity.<br>
> + bool findAugmentingPath() {<br>
> + // Initialize data structures<br>
> + for (auto &Node : Nodes) {<br>
> + Node.Distance = INF;<br>
> + Node.ParentNode = uint64_t(-1);<br>
> + Node.ParentEdgeIndex = uint64_t(-1);<br>
> + Node.Taken = false;<br>
> + }<br>
> +<br>
> + std::queue<uint64_t> Queue;<br>
> + Queue.push(Source);<br>
> + Nodes[Source].Distance = 0;<br>
> + Nodes[Source].Taken = true;<br>
> + while (!Queue.empty()) {<br>
> + uint64_t Src = Queue.front();<br>
> + Queue.pop();<br>
> + Nodes[Src].Taken = false;<br>
> + // Although the residual network contains edges with negative costs<br>
> + // (in particular, backward edges), it can be shown that there are no<br>
> + // negative-weight cycles and the following two invariants are maintained:<br>
> + // (i) Dist[Source, V] >= 0 and (ii) Dist[V, Target] >= 0 for all nodes V,<br>
> + // where Dist is the length of the shortest path between two nodes. This<br>
> + // allows to prune the search-space of the path-finding algorithm using<br>
> + // the following early-stop criteria:<br>
> + // -- If we find a path with zero-distance from Source to Target, stop the<br>
> + // search, as the path is the shortest since Dist[Source, Target] >= 0;<br>
> + // -- If we have Dist[Source, V] > Dist[Source, Target], then do not<br>
> + // process node V, as it is guaranteed _not_ to be on a shortest path<br>
> + // from Source to Target; it follows from inequalities<br>
> + // Dist[Source, Target] >= Dist[Source, V] + Dist[V, Target]<br>
> + // >= Dist[Source, V]<br>
> + if (Nodes[Target].Distance == 0)<br>
> + break;<br>
> + if (Nodes[Src].Distance > Nodes[Target].Distance)<br>
> + continue;<br>
> +<br>
> + // Process adjacent edges<br>
> + for (uint64_t EdgeIdx = 0; EdgeIdx < Edges[Src].size(); EdgeIdx++) {<br>
> + auto &Edge = Edges[Src][EdgeIdx];<br>
> + if (Edge.Flow < Edge.Capacity) {<br>
> + uint64_t Dst = Edge.Dst;<br>
> + int64_t NewDistance = Nodes[Src].Distance + Edge.Cost;<br>
> + if (Nodes[Dst].Distance > NewDistance) {<br>
> + // Update the distance and the parent node/edge<br>
> + Nodes[Dst].Distance = NewDistance;<br>
> + Nodes[Dst].ParentNode = Src;<br>
> + Nodes[Dst].ParentEdgeIndex = EdgeIdx;<br>
> + // Add the node to the queue, if it is not there yet<br>
> + if (!Nodes[Dst].Taken) {<br>
> + Queue.push(Dst);<br>
> + Nodes[Dst].Taken = true;<br>
> + }<br>
> + }<br>
> + }<br>
> + }<br>
> + }<br>
> +<br>
> + return Nodes[Target].Distance != INF;<br>
> + }<br>
> +<br>
> + /// Update the current flow along the augmenting path.<br>
> + void augmentFlowAlongPath() {<br>
> + // Find path capacity<br>
> + int64_t PathCapacity = INF;<br>
> + uint64_t Now = Target;<br>
> + while (Now != Source) {<br>
> + uint64_t Pred = Nodes[Now].ParentNode;<br>
> + auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];<br>
> + PathCapacity = std::min(PathCapacity, Edge.Capacity - Edge.Flow);<br>
> + Now = Pred;<br>
> + }<br>
> +<br>
> + assert(PathCapacity > 0 && "found incorrect augmenting path");<br>
> +<br>
> + // Update the flow along the path<br>
> + Now = Target;<br>
> + while (Now != Source) {<br>
> + uint64_t Pred = Nodes[Now].ParentNode;<br>
> + auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];<br>
> + auto &RevEdge = Edges[Now][Edge.RevEdgeIndex];<br>
> +<br>
> + Edge.Flow += PathCapacity;<br>
> + RevEdge.Flow -= PathCapacity;<br>
> +<br>
> + Now = Pred;<br>
> + }<br>
> + }<br>
> +<br>
> + /// An node in a flow network.<br>
> + struct Node {<br>
> + /// The cost of the cheapest path from the source to the current node.<br>
> + int64_t Distance;<br>
> + /// The node preceding the current one in the path.<br>
> + uint64_t ParentNode;<br>
> + /// The index of the edge between ParentNode and the current node.<br>
> + uint64_t ParentEdgeIndex;<br>
> + /// An indicator of whether the current node is in a queue.<br>
> + bool Taken;<br>
> + };<br>
> + /// An edge in a flow network.<br>
> + struct Edge {<br>
> + /// The cost of the edge.<br>
> + int64_t Cost;<br>
> + /// The capacity of the edge.<br>
> + int64_t Capacity;<br>
> + /// The current flow on the edge.<br>
> + int64_t Flow;<br>
> + /// The destination node of the edge.<br>
> + uint64_t Dst;<br>
> + /// The index of the reverse edge between Dst and the current node.<br>
> + uint64_t RevEdgeIndex;<br>
> + };<br>
> +<br>
> + /// The set of network nodes.<br>
> + std::vector<Node> Nodes;<br>
> + /// The set of network edges.<br>
> + std::vector<std::vector<Edge>> Edges;<br>
> + /// Source node of the flow.<br>
> + uint64_t Source;<br>
> + /// Target (sink) node of the flow.<br>
> + uint64_t Target;<br>
> +};<br>
> +<br>
> +/// Initializing flow network for a given function.<br>
> +///<br>
> +/// Every block is split into three nodes that are responsible for (i) an<br>
> +/// incoming flow, (ii) an outgoing flow, and (iii) penalizing an increase or<br>
> +/// reduction of the block weight.<br>
> +void initializeNetwork(MinCostMaxFlow &Network, FlowFunction &Func) {<br>
> + uint64_t NumBlocks = Func.Blocks.size();<br>
> + assert(NumBlocks > 1 && "Too few blocks in a function");<br>
> + LLVM_DEBUG(dbgs() << "Initializing profi for " << NumBlocks << " blocks\n");<br>
> +<br>
> + // Pre-process data: make sure the entry weight is at least 1<br>
> + if (Func.Blocks[Func.Entry].Weight == 0) {<br>
> + Func.Blocks[Func.Entry].Weight = 1;<br>
> + }<br>
> + // Introducing dummy source/sink pairs to allow flow circulation.<br>
> + // The nodes corresponding to blocks of Func have indicies in the range<br>
> + // [0..3 * NumBlocks); the dummy nodes are indexed by the next four values.<br>
> + uint64_t S = 3 * NumBlocks;<br>
> + uint64_t T = S + 1;<br>
> + uint64_t S1 = S + 2;<br>
> + uint64_t T1 = S + 3;<br>
> +<br>
> + Network.initialize(3 * NumBlocks + 4, S1, T1);<br>
> +<br>
> + // Create three nodes for every block of the function<br>
> + for (uint64_t B = 0; B < NumBlocks; B++) {<br>
> + auto &Block = Func.Blocks[B];<br>
> + assert((!Block.UnknownWeight || Block.Weight == 0 || Block.isEntry()) &&<br>
> + "non-zero weight of a block w/o weight except for an entry");<br>
> +<br>
> + // Split every block into two nodes<br>
> + uint64_t Bin = 3 * B;<br>
> + uint64_t Bout = 3 * B + 1;<br>
> + uint64_t Baux = 3 * B + 2;<br>
> + if (Block.Weight > 0) {<br>
> + Network.addEdge(S1, Bout, Block.Weight, 0);<br>
> + Network.addEdge(Bin, T1, Block.Weight, 0);<br>
> + }<br>
> +<br>
> + // Edges from S and to T<br>
> + assert((!Block.isEntry() || !Block.isExit()) &&<br>
> + "a block cannot be an entry and an exit");<br>
> + if (Block.isEntry()) {<br>
> + Network.addEdge(S, Bin, 0);<br>
> + } else if (Block.isExit()) {<br>
> + Network.addEdge(Bout, T, 0);<br>
> + }<br>
> +<br>
> + // An auxiliary node to allow increase/reduction of block counts:<br>
> + // We assume that decreasing block counts is more expensive than increasing,<br>
> + // and thus, setting separate costs here. In the future we may want to tune<br>
> + // the relative costs so as to maximize the quality of generated profiles.<br>
> + int64_t AuxCostInc = MinCostMaxFlow::AuxCostInc;<br>
> + int64_t AuxCostDec = MinCostMaxFlow::AuxCostDec;<br>
> + if (Block.UnknownWeight) {<br>
> + // Do not penalize changing weights of blocks w/o known profile count<br>
> + AuxCostInc = 0;<br>
> + AuxCostDec = 0;<br>
> + } else {<br>
> + // Increasing the count for "cold" blocks with zero initial count is more<br>
> + // expensive than for "hot" ones<br>
> + if (Block.Weight == 0) {<br>
> + AuxCostInc = MinCostMaxFlow::AuxCostIncZero;<br>
> + }<br>
> + // Modifying the count of the entry block is expensive<br>
> + if (Block.isEntry()) {<br>
> + AuxCostInc = MinCostMaxFlow::AuxCostIncEntry;<br>
> + AuxCostDec = MinCostMaxFlow::AuxCostDecEntry;<br>
> + }<br>
> + }<br>
> + // For blocks with self-edges, do not penalize a reduction of the count,<br>
> + // as all of the increase can be attributed to the self-edge<br>
> + if (Block.HasSelfEdge) {<br>
> + AuxCostDec = 0;<br>
> + }<br>
> +<br>
> + Network.addEdge(Bin, Baux, AuxCostInc);<br>
> + Network.addEdge(Baux, Bout, AuxCostInc);<br>
> + if (Block.Weight > 0) {<br>
> + Network.addEdge(Bout, Baux, AuxCostDec);<br>
> + Network.addEdge(Baux, Bin, AuxCostDec);<br>
> + }<br>
> + }<br>
> +<br>
> + // Creating edges for every jump<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + uint64_t Src = Jump.Source;<br>
> + uint64_t Dst = Jump.Target;<br>
> + if (Src != Dst) {<br>
> + uint64_t SrcOut = 3 * Src + 1;<br>
> + uint64_t DstIn = 3 * Dst;<br>
> + uint64_t Cost = Jump.IsUnlikely ? MinCostMaxFlow::AuxCostUnlikely : 0;<br>
> + Network.addEdge(SrcOut, DstIn, Cost);<br>
> + }<br>
> + }<br>
> +<br>
> + // Make sure we have a valid flow circulation<br>
> + Network.addEdge(T, S, 0);<br>
> +}<br>
> +<br>
> +/// Extract resulting block and edge counts from the flow network.<br>
> +void extractWeights(MinCostMaxFlow &Network, FlowFunction &Func) {<br>
> + uint64_t NumBlocks = Func.Blocks.size();<br>
> +<br>
> + // Extract resulting block counts<br>
> + for (uint64_t Src = 0; Src < NumBlocks; Src++) {<br>
> + auto &Block = Func.Blocks[Src];<br>
> + uint64_t SrcOut = 3 * Src + 1;<br>
> + int64_t Flow = 0;<br>
> + for (auto &Adj : Network.getFlow(SrcOut)) {<br>
> + uint64_t DstIn = Adj.first;<br>
> + int64_t DstFlow = Adj.second;<br>
> + bool IsAuxNode = (DstIn < 3 * NumBlocks && DstIn % 3 == 2);<br>
> + if (!IsAuxNode || Block.HasSelfEdge) {<br>
> + Flow += DstFlow;<br>
> + }<br>
> + }<br>
> + Block.Flow = Flow;<br>
> + assert(Flow >= 0 && "negative block flow");<br>
> + }<br>
> +<br>
> + // Extract resulting jump counts<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + uint64_t Src = Jump.Source;<br>
> + uint64_t Dst = Jump.Target;<br>
> + int64_t Flow = 0;<br>
> + if (Src != Dst) {<br>
> + uint64_t SrcOut = 3 * Src + 1;<br>
> + uint64_t DstIn = 3 * Dst;<br>
> + Flow = Network.getFlow(SrcOut, DstIn);<br>
> + } else {<br>
> + uint64_t SrcOut = 3 * Src + 1;<br>
> + uint64_t SrcAux = 3 * Src + 2;<br>
> + int64_t AuxFlow = Network.getFlow(SrcOut, SrcAux);<br>
> + if (AuxFlow > 0)<br>
> + Flow = AuxFlow;<br>
> + }<br>
> + Jump.Flow = Flow;<br>
> + assert(Flow >= 0 && "negative jump flow");<br>
> + }<br>
> +}<br>
> +<br>
> +#ifndef NDEBUG<br>
> +/// Verify that the computed flow values satisfy flow conservation rules<br>
> +void verifyWeights(const FlowFunction &Func) {<br>
> + const uint64_t NumBlocks = Func.Blocks.size();<br>
> + auto InFlow = std::vector<uint64_t>(NumBlocks, 0);<br>
> + auto OutFlow = std::vector<uint64_t>(NumBlocks, 0);<br>
> + for (auto &Jump : Func.Jumps) {<br>
> + InFlow[Jump.Target] += Jump.Flow;<br>
> + OutFlow[Jump.Source] += Jump.Flow;<br>
> + }<br>
> +<br>
> + uint64_t TotalInFlow = 0;<br>
> + uint64_t TotalOutFlow = 0;<br>
> + for (uint64_t I = 0; I < NumBlocks; I++) {<br>
> + auto &Block = Func.Blocks[I];<br>
> + if (Block.isEntry()) {<br>
> + TotalInFlow += Block.Flow;<br>
> + assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");<br>
> + } else if (Block.isExit()) {<br>
> + TotalOutFlow += Block.Flow;<br>
> + assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");<br>
> + } else {<br>
> + assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");<br>
> + assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");<br>
> + }<br>
> + }<br>
> + assert(TotalInFlow == TotalOutFlow && "incorrectly computed control flow");<br>
> +}<br>
> +#endif<br>
> +<br>
> +} // end of anonymous namespace<br>
> +<br>
> +/// Apply the profile inference algorithm for a given flow function<br>
> +void llvm::applyFlowInference(FlowFunction &Func) {<br>
> + // Create and apply an inference network model<br>
> + auto InferenceNetwork = MinCostMaxFlow();<br>
> + initializeNetwork(InferenceNetwork, Func);<br>
> + InferenceNetwork.run();<br>
> +<br>
> + // Extract flow values for every block and every edge<br>
> + extractWeights(InferenceNetwork, Func);<br>
> +<br>
> +#ifndef NDEBUG<br>
> + // Verify the result<br>
> + verifyWeights(Func);<br>
> +#endif<br>
> +}<br>
><br>
> diff --git a/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp b/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>
> index 6d995cf4c0481..ea0e8343eb887 100644<br>
> --- a/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>
> +++ b/llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp<br>
> @@ -34,6 +34,10 @@ cl::opt<bool> NoWarnSampleUnused(<br>
> cl::desc("Use this option to turn off/on warnings about function with "<br>
> "samples but without debug information to use those samples. "));<br>
> <br>
> +cl::opt<bool> SampleProfileUseProfi(<br>
> + "sample-profile-use-profi", cl::init(false), cl::Hidden, cl::ZeroOrMore,<br>
> + cl::desc("Use profi to infer block and edge counts."));<br>
> +<br>
> namespace sampleprofutil {<br>
> <br>
> /// Return true if the given callsite is hot wrt to hot cutoff threshold.<br>
><br>
> diff --git a/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof b/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>
> new file mode 100644<br>
> index 0000000000000..e995a04c7fd44<br>
> --- /dev/null<br>
> +++ b/llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof<br>
> @@ -0,0 +1,23 @@<br>
> +test_1:23968:0<br>
> + 1: 100<br>
> + 2: 60<br>
> + 3: 40<br>
> + !CFGChecksum: 4294967295<br>
> +<br>
> +test_2:23968:0<br>
> + 1: 100<br>
> + 3: 10<br>
> + !CFGChecksum: 37753817093<br>
> +<br>
> +test_3:10000:0<br>
> + 3: 13<br>
> + 5: 89<br>
> + !CFGChecksum: 69502983527<br>
> +<br>
> +sum_of_squares:23968:0<br>
> + 2: 5993<br>
> + 3: 1<br>
> + 4: 5992<br>
> + 5: 5992<br>
> + 8: 5992<br>
> + !CFGChecksum: 175862120757<br>
><br>
> diff --git a/llvm/test/Transforms/SampleProfile/profile-inference.ll b/llvm/test/Transforms/SampleProfile/profile-inference.ll<br>
> new file mode 100644<br>
> index 0000000000000..7f40358e65268<br>
> --- /dev/null<br>
> +++ b/llvm/test/Transforms/SampleProfile/profile-inference.ll<br>
> @@ -0,0 +1,245 @@<br>
> +; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof | opt -analyze -branch-prob -enable-new-pm=0 | FileCheck %s<br>
> +; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof | opt -analyze -block-freq -enable-new-pm=0 | FileCheck %s --check-prefix=CHECK2<br>
> +<br>
> +; The test verifies that profile inference correctly builds branch probabilities<br>
> +; from sampling-based block counts.<br>
> +;<br>
> +; +---------+ +----------+<br>
> +; | b3 [40] | <-- | b1 [100] |<br>
> +; +---------+ +----------+<br>
> +; |<br>
> +; |<br>
> +; v<br>
> +; +----------+<br>
> +; | b2 [60] |<br>
> +; +----------+<br>
> +<br>
> +@yydebug = dso_local global i32 0, align 4<br>
> +<br>
> +; Function Attrs: nounwind uwtable<br>
> +define dso_local i32 @test_1() #0 {<br>
> +b1:<br>
> + call void @llvm.pseudoprobe(i64 7964825052912775246, i64 1, i32 0, i64 -1)<br>
> + %0 = load i32, i32* @yydebug, align 4<br>
> + %cmp = icmp ne i32 %0, 0<br>
> + br i1 %cmp, label %b2, label %b3<br>
> +; CHECK: edge b1 -> b2 probability is 0x4ccccccd / 0x80000000 = 60.00%<br>
> +; CHECK: edge b1 -> b3 probability is 0x33333333 / 0x80000000 = 40.00%<br>
> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 100<br>
> +<br>
> +b2:<br>
> + call void @llvm.pseudoprobe(i64 7964825052912775246, i64 2, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 60<br>
> +<br>
> +b3:<br>
> + call void @llvm.pseudoprobe(i64 7964825052912775246, i64 3, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 40<br>
> +}<br>
> +<br>
> +<br>
> +; The test verifies that profile inference correctly builds branch probabilities<br>
> +; from sampling-based block counts in the presence of "dangling" probes (whose<br>
> +; block counts are missing).<br>
> +;<br>
> +; +---------+ +----------+<br>
> +; | b3 [10] | <-- | b1 [100] |<br>
> +; +---------+ +----------+<br>
> +; |<br>
> +; |<br>
> +; v<br>
> +; +----------+<br>
> +; | b2 [?] |<br>
> +; +----------+<br>
> +<br>
> +; Function Attrs: nounwind uwtable<br>
> +define dso_local i32 @test_2() #0 {<br>
> +b1:<br>
> + call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 1, i32 0, i64 -1)<br>
> + %0 = load i32, i32* @yydebug, align 4<br>
> + %cmp = icmp ne i32 %0, 0<br>
> + br i1 %cmp, label %b2, label %b3<br>
> +; CHECK: edge b1 -> b2 probability is 0x73333333 / 0x80000000 = 90.00%<br>
> +; CHECK: edge b1 -> b3 probability is 0x0ccccccd / 0x80000000 = 10.00%<br>
> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 100<br>
> +<br>
> +b2:<br>
> + call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 2, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 90<br>
> +<br>
> +b3:<br>
> + call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 3, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +}<br>
> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 10<br>
> +<br>
> +<br>
> +; The test verifies that profi is able to infer block counts from hot subgraphs.<br>
> +;<br>
> +; +---------+ +---------+<br>
> +; | b4 [?] | <-- | b1 [?] |<br>
> +; +---------+ +---------+<br>
> +; | |<br>
> +; | |<br>
> +; v v<br>
> +; +---------+ +---------+<br>
> +; | b5 [89] | | b2 [?] |<br>
> +; +---------+ +---------+<br>
> +; |<br>
> +; |<br>
> +; v<br>
> +; +---------+<br>
> +; | b3 [13] |<br>
> +; +---------+<br>
> +<br>
> +; Function Attrs: nounwind uwtable<br>
> +define dso_local i32 @test_3() #0 {<br>
> +b1:<br>
> + call void @llvm.pseudoprobe(i64 1649282507922421973, i64 1, i32 0, i64 -1)<br>
> + %0 = load i32, i32* @yydebug, align 4<br>
> + %cmp = icmp ne i32 %0, 0<br>
> + br i1 %cmp, label %b2, label %b4<br>
> +; CHECK: edge b1 -> b2 probability is 0x10505050 / 0x80000000 = 12.75%<br>
> +; CHECK: edge b1 -> b4 probability is 0x6fafafb0 / 0x80000000 = 87.25%<br>
> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 102<br>
> +<br>
> +b2:<br>
> + call void @llvm.pseudoprobe(i64 1649282507922421973, i64 2, i32 0, i64 -1)<br>
> + br label %b3<br>
> +; CHECK: edge b2 -> b3 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 13<br>
> +<br>
> +b3:<br>
> + call void @llvm.pseudoprobe(i64 1649282507922421973, i64 3, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 13<br>
> +<br>
> +b4:<br>
> + call void @llvm.pseudoprobe(i64 1649282507922421973, i64 4, i32 0, i64 -1)<br>
> + br label %b5<br>
> +; CHECK: edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b4: float = {{.*}}, int = {{.*}}, count = 89<br>
> +<br>
> +b5:<br>
> + call void @llvm.pseudoprobe(i64 1649282507922421973, i64 5, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +; CHECK2: - b5: float = {{.*}}, int = {{.*}}, count = 89<br>
> +}<br>
> +<br>
> +<br>
> +; A larger test to verify that profile inference correctly identifies hot parts<br>
> +; of the control-flow graph.<br>
> +;<br>
> +; +-----------+<br>
> +; | b1 [?] |<br>
> +; +-----------+<br>
> +; |<br>
> +; |<br>
> +; v<br>
> +; +--------+ +-----------+<br>
> +; | b3 [1] | <-- | b2 [5993] |<br>
> +; +--------+ +-----------+<br>
> +; | |<br>
> +; | |<br>
> +; | v<br>
> +; | +-----------+ +--------+<br>
> +; | | b4 [5992] | --> | b6 [?] |<br>
> +; | +-----------+ +--------+<br>
> +; | | |<br>
> +; | | |<br>
> +; | v |<br>
> +; | +-----------+ |<br>
> +; | | b5 [5992] | |<br>
> +; | +-----------+ |<br>
> +; | | |<br>
> +; | | |<br>
> +; | v |<br>
> +; | +-----------+ |<br>
> +; | | b7 [?] | |<br>
> +; | +-----------+ |<br>
> +; | | |<br>
> +; | | |<br>
> +; | v |<br>
> +; | +-----------+ |<br>
> +; | | b8 [5992] | <-----+<br>
> +; | +-----------+<br>
> +; | |<br>
> +; | |<br>
> +; | v<br>
> +; | +-----------+<br>
> +; +----------> | b9 [?] |<br>
> +; +-----------+<br>
> +<br>
> +; Function Attrs: nounwind uwtable<br>
> +define dso_local i32 @sum_of_squares() #0 {<br>
> +b1:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 1, i32 0, i64 -1)<br>
> + %0 = load i32, i32* @yydebug, align 4<br>
> + %cmp = icmp ne i32 %0, 0<br>
> + br label %b2<br>
> +; CHECK: edge b1 -> b2 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b1: float = {{.*}}, int = {{.*}}, count = 5993<br>
> +<br>
> +b2:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 2, i32 0, i64 -1)<br>
> + br i1 %cmp, label %b4, label %b3<br>
> +; CHECK: edge b2 -> b4 probability is 0x7ffa8844 / 0x80000000 = 99.98%<br>
> +; CHECK: edge b2 -> b3 probability is 0x000577bc / 0x80000000 = 0.02%<br>
> +; CHECK2: - b2: float = {{.*}}, int = {{.*}}, count = 5993<br>
> +<br>
> +b3:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 3, i32 0, i64 -1)<br>
> + br label %b9<br>
> +; CHECK: edge b3 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b3: float = {{.*}}, int = {{.*}}, count = 1<br>
> +<br>
> +b4:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 4, i32 0, i64 -1)<br>
> + br i1 %cmp, label %b5, label %b6<br>
> +; CHECK: edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK: edge b4 -> b6 probability is 0x00000000 / 0x80000000 = 0.00%<br>
> +; CHECK2: - b4: float = {{.*}}, int = {{.*}}, count = 5992<br>
> +<br>
> +b5:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 5, i32 0, i64 -1)<br>
> + br label %b7<br>
> +; CHECK: edge b5 -> b7 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b5: float = {{.*}}, int = {{.*}}, count = 5992<br>
> +<br>
> +b6:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 6, i32 0, i64 -1)<br>
> + br label %b8<br>
> +; CHECK: edge b6 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b6: float = {{.*}}, int = {{.*}}, count = 0<br>
> +<br>
> +b7:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 7, i32 0, i64 -1)<br>
> + br label %b8<br>
> +; CHECK: edge b7 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b7: float = {{.*}}, int = {{.*}}, count = 5992<br>
> +<br>
> +b8:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 8, i32 0, i64 -1)<br>
> + br label %b9<br>
> +; CHECK: edge b8 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%<br>
> +; CHECK2: - b8: float = {{.*}}, int = {{.*}}, count = 5992<br>
> +<br>
> +b9:<br>
> + call void @llvm.pseudoprobe(i64 -907520326213521421, i64 9, i32 0, i64 -1)<br>
> + ret i32 %0<br>
> +}<br>
> +; CHECK2: - b9: float = {{.*}}, int = {{.*}}, count = 5993<br>
> +<br>
> +declare void @llvm.pseudoprobe(i64, i64, i32, i64) #1<br>
> +<br>
> +attributes #0 = { noinline nounwind uwtable "use-sample-profile"}<br>
> +attributes #1 = { nounwind }<br>
> +<br>
> +!llvm.pseudo_probe_desc = !{!6, !7, !8, !9}<br>
> +<br>
> +!6 = !{i64 7964825052912775246, i64 4294967295, !"test_1", null}<br>
> +!7 = !{i64 -6216829535442445639, i64 37753817093, !"test_2", null}<br>
> +!8 = !{i64 1649282507922421973, i64 69502983527, !"test_3", null}<br>
> +!9 = !{i64 -907520326213521421, i64 175862120757, !"sum_of_squares", null}<br>
><br>
><br>
> <br>
> _______________________________________________<br>
> llvm-commits mailing list<br>
> llvm-commits@lists.llvm.org<br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>
<o:p></o:p></p>
</div>
</div>
</body>
</html>