[llvm] r262636 - Infrastructure for PGO enhancements in inliner

Daniel Jasper via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 3 15:20:26 PST 2016


Do you have an ETA? If it is going to take longer than an hour, would you
mind reverting this for now?

On Thu, Mar 3, 2016 at 3:19 PM, Easwaran Raman <eraman at google.com> wrote:

> The instruction following the call has been deleted in
> InlineFunction.cpp:1998 and it's parent is being accessed. Will think of a
> fix and send a patch.
>
> On Thu, Mar 3, 2016 at 3:03 PM, Daniel Jasper <djasper at google.com> wrote:
>
>>
>>
>> On Thu, Mar 3, 2016 at 10:26 AM, Easwaran Raman via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: eraman
>>> Date: Thu Mar  3 12:26:33 2016
>>> New Revision: 262636
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=262636&view=rev
>>> Log:
>>> Infrastructure for PGO enhancements in inliner
>>>
>>> This patch provides the following infrastructure for PGO enhancements in
>>> inliner:
>>>
>>> Enable the use of block level profile information in inliner
>>> Incremental update of block frequency information during inlining
>>> Update the function entry counts of callees when they get inlined into
>>> callers.
>>>
>>> Differential Revision: http://reviews.llvm.org/D16381
>>>
>>>
>>> Added:
>>>     llvm/trunk/test/Transforms/Inline/function-count-update-2.ll
>>>     llvm/trunk/test/Transforms/Inline/function-count-update-3.ll
>>>     llvm/trunk/test/Transforms/Inline/function-count-update.ll
>>> Modified:
>>>     llvm/trunk/include/llvm/Analysis/InlineCost.h
>>>     llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h
>>>     llvm/trunk/include/llvm/Transforms/Utils/Cloning.h
>>>     llvm/trunk/lib/Analysis/InlineCost.cpp
>>>     llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp
>>>     llvm/trunk/lib/Transforms/IPO/Inliner.cpp
>>>     llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp
>>>     llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp
>>>
>>> Modified: llvm/trunk/include/llvm/Analysis/InlineCost.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/InlineCost.h?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Analysis/InlineCost.h (original)
>>> +++ llvm/trunk/include/llvm/Analysis/InlineCost.h Thu Mar  3 12:26:33
>>> 2016
>>> @@ -20,6 +20,7 @@
>>>
>>>  namespace llvm {
>>>  class AssumptionCacheTracker;
>>> +class BlockFrequencyInfo;
>>>  class CallSite;
>>>  class DataLayout;
>>>  class Function;
>>> @@ -38,6 +39,21 @@ namespace InlineConstants {
>>>    const unsigned TotalAllocaSizeRecursiveCaller = 1024;
>>>  }
>>>
>>> +/// \brief Block frequency analysis for multiple functions.
>>> +/// This class mimics block frequency analysis on CGSCC level. Block
>>> frequency
>>> +/// info is computed on demand and cached unless they are invalidated.
>>> +class BlockFrequencyAnalysis {
>>> +private:
>>> +  DenseMap<Function *, BlockFrequencyInfo *> BFM;
>>> +
>>> +public:
>>> +  ~BlockFrequencyAnalysis();
>>> +  /// \brief Returns BlockFrequencyInfo for a function.
>>> +  BlockFrequencyInfo *getBlockFrequencyInfo(Function *);
>>> +  /// \brief Invalidates block frequency info for a function.
>>> +  void invalidateBlockFrequencyInfo(Function *);
>>> +};
>>> +
>>>  /// \brief Represents the cost of inlining a function.
>>>  ///
>>>  /// This supports special values for functions which should "always" or
>>> @@ -111,7 +127,8 @@ public:
>>>  /// inlining the callsite. It is an expensive, heavyweight call.
>>>  InlineCost getInlineCost(CallSite CS, int DefaultThreshold,
>>>                           TargetTransformInfo &CalleeTTI,
>>> -                         AssumptionCacheTracker *ACT);
>>> +                         AssumptionCacheTracker *ACT,
>>> +                         BlockFrequencyAnalysis *BFA);
>>>
>>>  /// \brief Get an InlineCost with the callee explicitly specified.
>>>  /// This allows you to calculate the cost of inlining a function via a
>>> @@ -120,7 +137,8 @@ InlineCost getInlineCost(CallSite CS, in
>>>  //
>>>  InlineCost getInlineCost(CallSite CS, Function *Callee, int
>>> DefaultThreshold,
>>>                           TargetTransformInfo &CalleeTTI,
>>> -                         AssumptionCacheTracker *ACT);
>>> +                         AssumptionCacheTracker *ACT,
>>> +                         BlockFrequencyAnalysis *BFA);
>>>
>>>  int computeThresholdFromOptLevels(unsigned OptLevel, unsigned
>>> SizeOptLevel);
>>>
>>> @@ -129,6 +147,9 @@ int getDefaultInlineThreshold();
>>>
>>>  /// \brief Minimal filter to detect invalid constructs for inlining.
>>>  bool isInlineViable(Function &Callee);
>>> +
>>> +/// \brief Return estimated count of the block \p BB.
>>> +Optional<uint64_t> getBlockCount(BasicBlock *BB, BlockFrequencyAnalysis
>>> *BFA);
>>>  }
>>>
>>>  #endif
>>>
>>> Modified: llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h (original)
>>> +++ llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h Thu Mar  3
>>> 12:26:33 2016
>>> @@ -24,8 +24,18 @@ class AssumptionCacheTracker;
>>>  class CallSite;
>>>  class DataLayout;
>>>  class InlineCost;
>>> +class BlockFrequencyAnalysis;
>>>  template <class PtrType, unsigned SmallSize> class SmallPtrSet;
>>>
>>> +// Functor invoked when a block is cloned during inlining.
>>> +typedef std::function<void(const BasicBlock *, const BasicBlock *)>
>>> +    BlockCloningFunctor;
>>> +// Functor invoked when a function is inlined inside the basic block
>>> +// containing the call.
>>> +typedef std::function<void(BasicBlock *, Function *)>
>>> FunctionCloningFunctor;
>>> +// Functor invoked when a function gets deleted during inlining.
>>> +typedef std::function<void(Function *)> FunctionDeletedFunctor;
>>> +
>>>  /// Inliner - This class contains all of the helper code which is used
>>> to
>>>  /// perform the inlining operations that do not depend on the policy.
>>>  ///
>>> @@ -69,9 +79,28 @@ private:
>>>    /// shouldInline - Return true if the inliner should attempt to
>>>    /// inline at the given CallSite.
>>>    bool shouldInline(CallSite CS);
>>> +  /// Set the BFI of \p Dst to be the same as \p Src.
>>> +  void copyBlockFrequency(BasicBlock *Src, BasicBlock *Dst);
>>> +  /// Invalidates BFI for function \p F.
>>> +  void invalidateBFI(Function *F);
>>> +  /// Invalidates BFI for all functions in  \p SCC.
>>> +  void invalidateBFI(CallGraphSCC &SCC);
>>> +  /// Update function entry count for \p Callee which has been inlined
>>> into
>>> +  /// \p CallBB.
>>> +  void updateEntryCount(BasicBlock *CallBB, Function *Callee);
>>> +  /// \brief Update block frequency of an inlined block.
>>> +  /// This method updates the block frequency of \p NewBB which is a
>>> clone of
>>> +  /// \p OrigBB when the callsite \p CS gets inlined. The frequency of
>>> \p NewBB
>>> +  /// is computed as follows:
>>> +  /// Freq(NewBB) = Freq(OrigBB) * CallSiteFreq / CalleeEntryFreq.
>>> +  void updateBlockFreq(CallSite &CS, const BasicBlock *OrigBB,
>>> +                       const BasicBlock *NewBB);
>>>
>>>  protected:
>>>    AssumptionCacheTracker *ACT;
>>> +  std::unique_ptr<BlockFrequencyAnalysis> BFA;
>>> +  /// Are we using profile guided optimization?
>>> +  bool HasProfileData;
>>>  };
>>>
>>>  } // End llvm namespace
>>>
>>> Modified: llvm/trunk/include/llvm/Transforms/Utils/Cloning.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Utils/Cloning.h?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Transforms/Utils/Cloning.h (original)
>>> +++ llvm/trunk/include/llvm/Transforms/Utils/Cloning.h Thu Mar  3
>>> 12:26:33 2016
>>> @@ -48,6 +48,9 @@ class AllocaInst;
>>>  class AssumptionCacheTracker;
>>>  class DominatorTree;
>>>
>>> +typedef std::function<void(const BasicBlock *, const BasicBlock *)>
>>> +    BlockCloningFunctor;
>>> +
>>>  /// Return an exact copy of the specified module
>>>  ///
>>>  std::unique_ptr<Module> CloneModule(const Module *M);
>>> @@ -157,7 +160,8 @@ void CloneAndPruneIntoFromInst(Function
>>>                                 ValueToValueMapTy &VMap, bool
>>> ModuleLevelChanges,
>>>                                 SmallVectorImpl<ReturnInst *> &Returns,
>>>                                 const char *NameSuffix = "",
>>> -                               ClonedCodeInfo *CodeInfo = nullptr);
>>> +                               ClonedCodeInfo *CodeInfo = nullptr,
>>> +                               BlockCloningFunctor Ftor = nullptr);
>>>
>>>  /// CloneAndPruneFunctionInto - This works exactly like
>>> CloneFunctionInto,
>>>  /// except that it does some simple constant prop and DCE on the fly.
>>> The
>>> @@ -172,23 +176,27 @@ void CloneAndPruneIntoFromInst(Function
>>>  ///
>>>  void CloneAndPruneFunctionInto(Function *NewFunc, const Function
>>> *OldFunc,
>>>                                 ValueToValueMapTy &VMap, bool
>>> ModuleLevelChanges,
>>> -                               SmallVectorImpl<ReturnInst*> &Returns,
>>> +                               SmallVectorImpl<ReturnInst *> &Returns,
>>>                                 const char *NameSuffix = "",
>>>                                 ClonedCodeInfo *CodeInfo = nullptr,
>>> -                               Instruction *TheCall = nullptr);
>>> +                               Instruction *TheCall = nullptr,
>>> +                               BlockCloningFunctor Ftor = nullptr);
>>>
>>>  /// InlineFunctionInfo - This class captures the data input to the
>>>  /// InlineFunction call, and records the auxiliary results produced by
>>> it.
>>>  class InlineFunctionInfo {
>>>  public:
>>>    explicit InlineFunctionInfo(CallGraph *cg = nullptr,
>>> -                              AssumptionCacheTracker *ACT = nullptr)
>>> -      : CG(cg), ACT(ACT) {}
>>> +                              AssumptionCacheTracker *ACT = nullptr,
>>> +                              BlockCloningFunctor Ftor = nullptr)
>>> +      : CG(cg), ACT(ACT), Ftor(Ftor) {}
>>>
>>>    /// CG - If non-null, InlineFunction will update the callgraph to
>>> reflect the
>>>    /// changes it makes.
>>>    CallGraph *CG;
>>>    AssumptionCacheTracker *ACT;
>>> +  // Functor that is invoked when a block is cloned into the new
>>> function.
>>> +  BlockCloningFunctor Ftor;
>>>
>>>    /// StaticAllocas - InlineFunction fills this in with all static
>>> allocas that
>>>    /// get copied into the caller.
>>>
>>> Modified: llvm/trunk/lib/Analysis/InlineCost.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InlineCost.cpp?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Analysis/InlineCost.cpp (original)
>>> +++ llvm/trunk/lib/Analysis/InlineCost.cpp Thu Mar  3 12:26:33 2016
>>> @@ -18,13 +18,18 @@
>>>  #include "llvm/ADT/SmallVector.h"
>>>  #include "llvm/ADT/Statistic.h"
>>>  #include "llvm/Analysis/AssumptionCache.h"
>>> +#include "llvm/Analysis/BlockFrequencyInfo.h"
>>> +#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
>>> +#include "llvm/Analysis/BranchProbabilityInfo.h"
>>>  #include "llvm/Analysis/CodeMetrics.h"
>>>  #include "llvm/Analysis/ConstantFolding.h"
>>>  #include "llvm/Analysis/InstructionSimplify.h"
>>> +#include "llvm/Analysis/LoopInfo.h"
>>>  #include "llvm/Analysis/TargetTransformInfo.h"
>>>  #include "llvm/IR/CallSite.h"
>>>  #include "llvm/IR/CallingConv.h"
>>>  #include "llvm/IR/DataLayout.h"
>>> +#include "llvm/IR/Dominators.h"
>>>  #include "llvm/IR/GetElementPtrTypeIterator.h"
>>>  #include "llvm/IR/GlobalAlias.h"
>>>  #include "llvm/IR/InstVisitor.h"
>>> @@ -85,6 +90,7 @@ class CallAnalyzer : public InstVisitor<
>>>    // easily cacheable. Instead, use the cover function paramHasAttr.
>>>    CallSite CandidateCS;
>>>
>>> +  BlockFrequencyAnalysis *BFA;
>>>    int Threshold;
>>>    int Cost;
>>>
>>> @@ -153,6 +159,8 @@ class CallAnalyzer : public InstVisitor<
>>>    /// passed to support analyzing indirect calls whose target is
>>> inferred by
>>>    /// analysis.
>>>    void updateThreshold(CallSite CS, Function &Callee);
>>> +  /// Adjust Threshold based on CallSiteCount and return the adjusted
>>> threshold.
>>> +  int getAdjustedThreshold(int Threshold, Optional<uint64_t>
>>> CallSiteCount);
>>>
>>>    // Custom analysis routines.
>>>    bool analyzeBlock(BasicBlock *BB, SmallPtrSetImpl<const Value *>
>>> &EphValues);
>>> @@ -194,17 +202,19 @@ class CallAnalyzer : public InstVisitor<
>>>
>>>  public:
>>>    CallAnalyzer(const TargetTransformInfo &TTI, AssumptionCacheTracker
>>> *ACT,
>>> -               Function &Callee, int Threshold, CallSite CSArg)
>>> -    : TTI(TTI), ACT(ACT), F(Callee), CandidateCS(CSArg),
>>> Threshold(Threshold),
>>> -        Cost(0), IsCallerRecursive(false), IsRecursiveCall(false),
>>> -        ExposesReturnsTwice(false), HasDynamicAlloca(false),
>>> -        ContainsNoDuplicateCall(false), HasReturn(false),
>>> HasIndirectBr(false),
>>> -        HasFrameEscape(false), AllocatedSize(0), NumInstructions(0),
>>> -        NumVectorInstructions(0), FiftyPercentVectorBonus(0),
>>> -        TenPercentVectorBonus(0), VectorBonus(0), NumConstantArgs(0),
>>> -        NumConstantOffsetPtrArgs(0), NumAllocaArgs(0),
>>> NumConstantPtrCmps(0),
>>> -        NumConstantPtrDiffs(0), NumInstructionsSimplified(0),
>>> -        SROACostSavings(0), SROACostSavingsLost(0) {}
>>> +               Function &Callee, int Threshold, CallSite CSArg,
>>> +               BlockFrequencyAnalysis *BFA)
>>> +      : TTI(TTI), ACT(ACT), F(Callee), CandidateCS(CSArg), BFA(BFA),
>>> +        Threshold(Threshold), Cost(0), IsCallerRecursive(false),
>>> +        IsRecursiveCall(false), ExposesReturnsTwice(false),
>>> +        HasDynamicAlloca(false), ContainsNoDuplicateCall(false),
>>> +        HasReturn(false), HasIndirectBr(false), HasFrameEscape(false),
>>> +        AllocatedSize(0), NumInstructions(0), NumVectorInstructions(0),
>>> +        FiftyPercentVectorBonus(0), TenPercentVectorBonus(0),
>>> VectorBonus(0),
>>> +        NumConstantArgs(0), NumConstantOffsetPtrArgs(0),
>>> NumAllocaArgs(0),
>>> +        NumConstantPtrCmps(0), NumConstantPtrDiffs(0),
>>> +        NumInstructionsSimplified(0), SROACostSavings(0),
>>> +        SROACostSavingsLost(0) {}
>>>
>>>    bool analyzeCall(CallSite CS);
>>>
>>> @@ -572,6 +582,15 @@ bool CallAnalyzer::isKnownNonNullInCalle
>>>    return false;
>>>  }
>>>
>>> +// Adjust the threshold based on callsite hotness. Currently this is a
>>> nop.
>>> +int CallAnalyzer::getAdjustedThreshold(int Threshold,
>>> +                                       Optional<uint64_t> CallSiteCount
>>> +                                       __attribute__((unused))) {
>>> +  // FIXME: The new threshold should be computed from the given
>>> Threshold and
>>> +  // the callsite hotness.
>>> +  return Threshold;
>>> +}
>>> +
>>>  void CallAnalyzer::updateThreshold(CallSite CS, Function &Callee) {
>>>    // If -inline-threshold is not given, listen to the optsize and
>>> minsize
>>>    // attributes when they would decrease the threshold.
>>> @@ -596,6 +615,9 @@ void CallAnalyzer::updateThreshold(CallS
>>>      FunctionCount = Callee.getEntryCount().getValue();
>>>      MaxFunctionCount =
>>> Callee.getParent()->getMaximumFunctionCount().getValue();
>>>    }
>>> +  Optional<uint64_t> CallSiteCount =
>>> +      llvm::getBlockCount(CS.getInstruction()->getParent(), BFA);
>>> +  Threshold = getAdjustedThreshold(Threshold, CallSiteCount);
>>>
>>>    // Listen to the inlinehint attribute or profile based hotness
>>> information
>>>    // when it would increase the threshold and the caller does not need
>>> to
>>> @@ -912,7 +934,8 @@ bool CallAnalyzer::visitCallSite(CallSit
>>>    // during devirtualization and so we want to give it a hefty bonus for
>>>    // inlining, but cap that bonus in the event that inlining wouldn't
>>> pan
>>>    // out. Pretend to inline the function, with a custom threshold.
>>> -  CallAnalyzer CA(TTI, ACT, *F, InlineConstants::IndirectCallThreshold,
>>> CS);
>>> +  CallAnalyzer CA(TTI, ACT, *F, InlineConstants::IndirectCallThreshold,
>>> CS,
>>> +                  BFA);
>>>    if (CA.analyzeCall(CS)) {
>>>      // We were able to inline the indirect call! Subtract the cost from
>>> the
>>>      // threshold to get the bonus we want to apply, but don't go below
>>> zero.
>>> @@ -1433,9 +1456,10 @@ static bool functionsHaveCompatibleAttri
>>>
>>>  InlineCost llvm::getInlineCost(CallSite CS, int DefaultThreshold,
>>>                                 TargetTransformInfo &CalleeTTI,
>>> -                               AssumptionCacheTracker *ACT) {
>>> +                               AssumptionCacheTracker *ACT,
>>> +                               BlockFrequencyAnalysis *BFA) {
>>>    return getInlineCost(CS, CS.getCalledFunction(), DefaultThreshold,
>>> CalleeTTI,
>>> -                       ACT);
>>> +                       ACT, BFA);
>>>  }
>>>
>>>  int llvm::computeThresholdFromOptLevels(unsigned OptLevel,
>>> @@ -1454,7 +1478,8 @@ int llvm::getDefaultInlineThreshold() {
>>>  InlineCost llvm::getInlineCost(CallSite CS, Function *Callee,
>>>                                 int DefaultThreshold,
>>>                                 TargetTransformInfo &CalleeTTI,
>>> -                               AssumptionCacheTracker *ACT) {
>>> +                               AssumptionCacheTracker *ACT,
>>> +                               BlockFrequencyAnalysis *BFA) {
>>>
>>>    // Cannot inline indirect calls.
>>>    if (!Callee)
>>> @@ -1487,7 +1512,7 @@ InlineCost llvm::getInlineCost(CallSite
>>>    DEBUG(llvm::dbgs() << "      Analyzing call of " << Callee->getName()
>>>          << "...\n");
>>>
>>> -  CallAnalyzer CA(CalleeTTI, ACT, *Callee, DefaultThreshold, CS);
>>> +  CallAnalyzer CA(CalleeTTI, ACT, *Callee, DefaultThreshold, CS, BFA);
>>>    bool ShouldInline = CA.analyzeCall(CS);
>>>
>>>    DEBUG(CA.dump());
>>> @@ -1535,3 +1560,45 @@ bool llvm::isInlineViable(Function &F) {
>>>
>>>    return true;
>>>  }
>>> +
>>> +/// \brief Get estimated execution count for \p BB.
>>> +Optional<uint64_t> llvm::getBlockCount(BasicBlock *BB,
>>> +                                       BlockFrequencyAnalysis *BFA) {
>>> +  if (!BFA)
>>> +    return None;
>>> +  Function *F = BB->getParent();
>>> +  Optional<uint64_t> EntryCount = F->getEntryCount();
>>> +  if (!EntryCount)
>>> +    return None;
>>> +  BlockFrequencyInfo *BFI = BFA->getBlockFrequencyInfo(F);
>>> +  uint64_t BBFreq = BFI->getBlockFreq(BB).getFrequency();
>>> +  uint64_t FunctionEntryFreq = BFI->getEntryFreq();
>>> +  uint64_t BBCount = EntryCount.getValue() * BBFreq / FunctionEntryFreq;
>>> +  return BBCount;
>>> +}
>>> +
>>> +BlockFrequencyAnalysis::~BlockFrequencyAnalysis() {
>>> +  for (auto &Entry : BFM) {
>>> +    delete Entry.second;
>>> +  }
>>> +}
>>> +
>>> +/// \brief Get BlockFrequencyInfo for a function.
>>> +BlockFrequencyInfo
>>> *BlockFrequencyAnalysis::getBlockFrequencyInfo(Function *F) {
>>> +  auto Iter = BFM.find(F);
>>> +  if (Iter != BFM.end())
>>> +    return Iter->second;
>>> +  // We need to create a BlockFrequencyInfo object for F and store it.
>>> +  DominatorTree DT;
>>> +  DT.recalculate(*F);
>>> +  LoopInfo LI(DT);
>>> +  BranchProbabilityInfo BPI(*F, LI);
>>> +  BlockFrequencyInfo *BFI = new BlockFrequencyInfo(*F, BPI, LI);
>>> +  BFM[F] = BFI;
>>> +  return BFI;
>>> +}
>>> +
>>> +/// \brief Invalidate BlockFrequencyInfo for a function.
>>> +void BlockFrequencyAnalysis::invalidateBlockFrequencyInfo(Function *F) {
>>> +  BFM.erase(F);
>>> +}
>>>
>>> Modified: llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp Thu Mar  3 12:26:33
>>> 2016
>>> @@ -59,7 +59,8 @@ public:
>>>    InlineCost getInlineCost(CallSite CS) override {
>>>      Function *Callee = CS.getCalledFunction();
>>>      TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);
>>> -    return llvm::getInlineCost(CS, DefaultThreshold, TTI, ACT);
>>> +    return llvm::getInlineCost(CS, DefaultThreshold, TTI, ACT,
>>> +                               HasProfileData ? BFA.get() : nullptr);
>>>    }
>>>
>>>    bool runOnSCC(CallGraphSCC &SCC) override;
>>>
>>> Modified: llvm/trunk/lib/Transforms/IPO/Inliner.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/Inliner.cpp?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/IPO/Inliner.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/IPO/Inliner.cpp Thu Mar  3 12:26:33 2016
>>> @@ -19,6 +19,7 @@
>>>  #include "llvm/Analysis/AliasAnalysis.h"
>>>  #include "llvm/Analysis/AssumptionCache.h"
>>>  #include "llvm/Analysis/BasicAliasAnalysis.h"
>>> +#include "llvm/Analysis/BlockFrequencyInfo.h"
>>>  #include "llvm/Analysis/CallGraph.h"
>>>  #include "llvm/Analysis/InlineCost.h"
>>>  #include "llvm/Analysis/TargetLibraryInfo.h"
>>> @@ -47,10 +48,13 @@ STATISTIC(NumMergedAllocas, "Number of a
>>>  // if those would be more profitable and blocked inline steps.
>>>  STATISTIC(NumCallerCallersAnalyzed, "Number of caller-callers
>>> analyzed");
>>>
>>> -Inliner::Inliner(char &ID) : CallGraphSCCPass(ID), InsertLifetime(true)
>>> {}
>>> +Inliner::Inliner(char &ID)
>>> +    : CallGraphSCCPass(ID), InsertLifetime(true),
>>> +      BFA(new BlockFrequencyAnalysis()) {}
>>>
>>>  Inliner::Inliner(char &ID, bool InsertLifetime)
>>> -    : CallGraphSCCPass(ID), InsertLifetime(InsertLifetime) {}
>>> +    : CallGraphSCCPass(ID), InsertLifetime(InsertLifetime),
>>> +      BFA(new BlockFrequencyAnalysis()) {}
>>>
>>>  /// For this class, we declare that we require and preserve the call
>>> graph.
>>>  /// If the derived class implements this method, it should
>>> @@ -259,7 +263,7 @@ bool Inliner::shouldInline(CallSite CS)
>>>                           Twine(IC.getCostDelta() + IC.getCost()) + ")");
>>>      return false;
>>>    }
>>> -
>>> +
>>>    // Try to detect the case where the current inlining candidate caller
>>> (call
>>>    // it B) is a static or linkonce-ODR function and is an inlining
>>> candidate
>>>    // elsewhere, and the current candidate callee (call it C) is large
>>> enough
>>> @@ -356,8 +360,90 @@ static bool InlineHistoryIncludes(Functi
>>>    return false;
>>>  }
>>>
>>> +/// \brief Update the frequency of a block that is cloned into the
>>> caller.
>>> +/// This is invoked when \p OrigBB from the callee is cloned into \p
>>> NewBB in
>>> +/// the caller.
>>> +void Inliner::updateBlockFreq(CallSite &CS, const BasicBlock *OrigBB,
>>> +                              const BasicBlock *NewBB) {
>>> +  if (!HasProfileData)
>>> +    return;
>>> +  Instruction *Call = CS.getInstruction();
>>> +  BasicBlock *CallBB = Call->getParent();
>>> +  BlockFrequencyInfo *CalleeBFI =
>>> +      BFA->getBlockFrequencyInfo(CS.getCalledFunction());
>>> +  BlockFrequencyInfo *CallerBFI =
>>> +      BFA->getBlockFrequencyInfo(CallBB->getParent());
>>> +  // Find the number of times OrigBB is executed per invocation of the
>>> callee
>>> +  // and multiply by the number of times callee is executed in the
>>> caller.
>>> +  // Freq(NewBB) = Freq(OrigBB) * CallSiteFreq / CalleeEntryFreq.
>>> +  uint64_t CallSiteFreq =
>>> CallerBFI->getBlockFreq(CallBB).getFrequency();
>>> +  uint64_t CalleeEntryFreq = CalleeBFI->getEntryFreq();
>>> +  // Frequency of OrigBB in the callee.
>>> +  BlockFrequency OrigBBFreq = CalleeBFI->getBlockFreq(OrigBB);
>>> +  CallerBFI->setBlockFreq(NewBB, (double)(OrigBBFreq.getFrequency()) /
>>> +                                     CalleeEntryFreq * CallSiteFreq);
>>> +}
>>> +
>>> +/// \brief Update entry count of \p Callee after it got inlined at a
>>> callsite
>>> +/// in block \p CallBB.
>>> +void Inliner::updateEntryCount(BasicBlock *CallBB, Function *Callee) {
>>> +  if (!HasProfileData)
>>> +    return;
>>> +  // If the callee has a original count of N, and the estimated count of
>>> +  // callsite is M, the new callee count is set to N - M. M is
>>> estimated from
>>> +  // the caller's entry count, its entry block frequency and the block
>>> frequency
>>> +  // of the callsite.
>>> +  Optional<uint64_t> CalleeCount = Callee->getEntryCount();
>>> +  if (!CalleeCount)
>>> +    return;
>>> +  Optional<uint64_t> CallSiteCount = llvm::getBlockCount(CallBB,
>>> BFA.get());
>>> +  if (!CallSiteCount)
>>> +    return;
>>> +  // Since CallSiteCount is an estimate, it could exceed the original
>>> callee
>>> +  // count and has to be set to 0.
>>> +  if (CallSiteCount.getValue() > CalleeCount.getValue()) {
>>> +    Callee->setEntryCount(0);
>>> +    DEBUG(llvm::dbgs() << "Estimated count of block " <<
>>> CallBB->getName()
>>> +                       << " is " << CallSiteCount.getValue()
>>> +                       << " which exceeds the entry count "
>>> +                       << CalleeCount.getValue() << " of the callee "
>>> +                       << Callee->getName() << "\n");
>>> +  } else
>>> +    Callee->setEntryCount(CalleeCount.getValue() -
>>> CallSiteCount.getValue());
>>> +}
>>> +
>>> +void Inliner::invalidateBFI(Function *F) {
>>> +  if (!HasProfileData)
>>> +    return;
>>> +  if (F)
>>> +    BFA->invalidateBlockFrequencyInfo(F);
>>> +}
>>> +void Inliner::invalidateBFI(CallGraphSCC &SCC) {
>>> +  if (!HasProfileData)
>>> +    return;
>>> +  for (CallGraphNode *Node : SCC) {
>>> +    Function *F = Node->getFunction();
>>> +    invalidateBFI(F);
>>> +  }
>>> +}
>>> +void Inliner::copyBlockFrequency(BasicBlock *Src, BasicBlock *Dst) {
>>> +  if (!HasProfileData)
>>> +    return;
>>> +  Function *F = Src->getParent();
>>> +  BlockFrequencyInfo *BFI = BFA->getBlockFrequencyInfo(F);
>>> +  BFI->setBlockFreq(Dst, BFI->getBlockFreq(Src).getFrequency());
>>> +}
>>> +
>>> +static bool hasProfileData(Module &M) {
>>> +  // We check for the presence of MaxFunctionCount in the module.
>>> +  // FIXME: This now only works for frontend based instrumentation.
>>> +  return M.getMaximumFunctionCount().hasValue();
>>> +}
>>> +
>>>  bool Inliner::runOnSCC(CallGraphSCC &SCC) {
>>> +  using namespace std::placeholders;
>>>    CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
>>> +  HasProfileData = hasProfileData(CG.getModule());
>>>    ACT = &getAnalysis<AssumptionCacheTracker>();
>>>    auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
>>>
>>> @@ -419,7 +505,6 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>
>>>
>>>    InlinedArrayAllocasTy InlinedArrayAllocas;
>>> -  InlineFunctionInfo InlineInfo(&CG, ACT);
>>>
>>>    // Now that we have all of the call sites, loop over them and inline
>>> them if
>>>    // it looks profitable to do so.
>>> @@ -448,6 +533,10 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>          CS.getInstruction()->eraseFromParent();
>>>          ++NumCallsDeleted;
>>>        } else {
>>> +        Instruction *TheCall = CS.getInstruction();
>>>
>>
>> In test/Transforms/Inline/inline-tail.ll, ASAN is telling me that there
>> is a use after free here. Presumably, CS has already been inlined/deleted?
>>
>>
>>> +        BasicBlock *CallSiteBlock = TheCall->getParent();
>>> +        Instruction *CallSuccessor =
>>> &*(++BasicBlock::iterator(TheCall));
>>
>> +
>>>          // We can only inline direct calls to non-declarations.
>>>          if (!Callee || Callee->isDeclaration()) continue;
>>>
>>> @@ -476,6 +565,11 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>            continue;
>>>          }
>>>
>>> +        BlockCloningFunctor BCF = nullptr;
>>> +        if (HasProfileData)
>>> +          BCF = std::bind(&Inliner::updateBlockFreq, this, CS, _1, _2);
>>> +        InlineFunctionInfo InlineInfo(&CG, ACT, BCF);
>>> +
>>>          // Attempt to inline the function.
>>>          if (!InlineCallIfPossible(*this, CS, InlineInfo,
>>> InlinedArrayAllocas,
>>>                                    InlineHistoryID, InsertLifetime)) {
>>> @@ -485,6 +579,13 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>                                               Caller->getName()));
>>>            continue;
>>>          }
>>> +        updateEntryCount(CallSiteBlock, Callee);
>>> +        // The instruction following the call is part of a new basic
>>> block
>>> +        // created during the inlining process. This does not have an
>>> entry in
>>> +        // the BFI. We create an entry by copying the frequency of the
>>> original
>>> +        // block containing the call.
>>> +        copyBlockFrequency(CallSiteBlock, CallSuccessor->getParent());
>>> +
>>>          ++NumInlined;
>>>
>>>          // Report the inline decision.
>>> @@ -523,7 +624,9 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>          CalleeNode->removeAllCalledFunctions();
>>>
>>>          // Removing the node for callee from the call graph and delete
>>> it.
>>> -        delete CG.removeFunctionFromModule(CalleeNode);
>>> +        Function *F = CG.removeFunctionFromModule(CalleeNode);
>>> +        invalidateBFI(F);
>>> +        delete F;
>>>          ++NumDeleted;
>>>        }
>>>
>>> @@ -544,6 +647,7 @@ bool Inliner::runOnSCC(CallGraphSCC &SCC
>>>      }
>>>    } while (LocalChange);
>>>
>>> +  invalidateBFI(SCC);
>>>    return Changed;
>>>  }
>>>
>>> @@ -651,7 +755,9 @@ bool Inliner::removeDeadFunctions(CallGr
>>>                                        FunctionsToRemove.end()),
>>>                            FunctionsToRemove.end());
>>>    for (CallGraphNode *CGN : FunctionsToRemove) {
>>> -    delete CG.removeFunctionFromModule(CGN);
>>> +    Function *F = CG.removeFunctionFromModule(CGN);
>>> +    invalidateBFI(F);
>>> +    delete F;
>>>      ++NumDeleted;
>>>    }
>>>    return true;
>>>
>>> Modified: llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Utils/CloneFunction.cpp Thu Mar  3
>>> 12:26:33 2016
>>> @@ -277,9 +277,10 @@ namespace {
>>>
>>>      /// The specified block is found to be reachable, clone it and
>>>      /// anything that it can reach.
>>> -    void CloneBlock(const BasicBlock *BB,
>>> +    void CloneBlock(const BasicBlock *BB,
>>>                      BasicBlock::const_iterator StartingInst,
>>> -                    std::vector<const BasicBlock*> &ToClone);
>>> +                    std::vector<const BasicBlock *> &ToClone,
>>> +                    BlockCloningFunctor Ftor = nullptr);
>>>    };
>>>  }
>>>
>>> @@ -287,7 +288,8 @@ namespace {
>>>  /// anything that it can reach.
>>>  void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,
>>>                                         BasicBlock::const_iterator
>>> StartingInst,
>>> -                                       std::vector<const BasicBlock*>
>>> &ToClone){
>>> +                                       std::vector<const BasicBlock *>
>>> &ToClone,
>>> +                                       BlockCloningFunctor Ftor) {
>>>    WeakVH &BBEntry = VMap[BB];
>>>
>>>    // Have we already cloned this block?
>>> @@ -424,18 +426,19 @@ void PruningFunctionCloner::CloneBlock(c
>>>      CodeInfo->ContainsDynamicAllocas |= hasStaticAllocas &&
>>>        BB != &BB->getParent()->front();
>>>    }
>>> +  // Call Ftor to tell BB has been cloned to NewBB
>>> +  if (Ftor)
>>> +    Ftor(BB, NewBB);
>>>  }
>>>
>>>  /// This works like CloneAndPruneFunctionInto, except that it does not
>>> clone the
>>>  /// entire function. Instead it starts at an instruction provided by
>>> the caller
>>>  /// and copies (and prunes) only the code reachable from that
>>> instruction.
>>> -void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function
>>> *OldFunc,
>>> -                                     const Instruction *StartingInst,
>>> -                                     ValueToValueMapTy &VMap,
>>> -                                     bool ModuleLevelChanges,
>>> -                                     SmallVectorImpl<ReturnInst *>
>>> &Returns,
>>> -                                     const char *NameSuffix,
>>> -                                     ClonedCodeInfo *CodeInfo) {
>>> +void llvm::CloneAndPruneIntoFromInst(
>>> +    Function *NewFunc, const Function *OldFunc, const Instruction
>>> *StartingInst,
>>> +    ValueToValueMapTy &VMap, bool ModuleLevelChanges,
>>> +    SmallVectorImpl<ReturnInst *> &Returns, const char *NameSuffix,
>>> +    ClonedCodeInfo *CodeInfo, BlockCloningFunctor Ftor) {
>>>    assert(NameSuffix && "NameSuffix cannot be null!");
>>>
>>>    ValueMapTypeRemapper *TypeMapper = nullptr;
>>> @@ -461,11 +464,11 @@ void llvm::CloneAndPruneIntoFromInst(Fun
>>>
>>>    // Clone the entry block, and anything recursively reachable from it.
>>>    std::vector<const BasicBlock*> CloneWorklist;
>>> -  PFC.CloneBlock(StartingBB, StartingInst->getIterator(),
>>> CloneWorklist);
>>> +  PFC.CloneBlock(StartingBB, StartingInst->getIterator(),
>>> CloneWorklist, Ftor);
>>>    while (!CloneWorklist.empty()) {
>>>      const BasicBlock *BB = CloneWorklist.back();
>>>      CloneWorklist.pop_back();
>>> -    PFC.CloneBlock(BB, BB->begin(), CloneWorklist);
>>> +    PFC.CloneBlock(BB, BB->begin(), CloneWorklist, Ftor);
>>>    }
>>>
>>>    // Loop over all of the basic blocks in the old function.  If the
>>> block was
>>> @@ -667,15 +670,14 @@ void llvm::CloneAndPruneIntoFromInst(Fun
>>>  /// constant arguments cause a significant amount of code in the callee
>>> to be
>>>  /// dead.  Since this doesn't produce an exact copy of the input, it
>>> can't be
>>>  /// used for things like CloneFunction or CloneModule.
>>> -void llvm::CloneAndPruneFunctionInto(Function *NewFunc, const Function
>>> *OldFunc,
>>> -                                     ValueToValueMapTy &VMap,
>>> -                                     bool ModuleLevelChanges,
>>> -                                     SmallVectorImpl<ReturnInst*>
>>> &Returns,
>>> -                                     const char *NameSuffix,
>>> -                                     ClonedCodeInfo *CodeInfo,
>>> -                                     Instruction *TheCall) {
>>> +void llvm::CloneAndPruneFunctionInto(
>>> +    Function *NewFunc, const Function *OldFunc, ValueToValueMapTy &VMap,
>>> +    bool ModuleLevelChanges, SmallVectorImpl<ReturnInst *> &Returns,
>>> +    const char *NameSuffix, ClonedCodeInfo *CodeInfo, Instruction
>>> *TheCall,
>>> +    BlockCloningFunctor Ftor) {
>>>    CloneAndPruneIntoFromInst(NewFunc, OldFunc,
>>> &OldFunc->front().front(), VMap,
>>> -                            ModuleLevelChanges, Returns, NameSuffix,
>>> CodeInfo);
>>> +                            ModuleLevelChanges, Returns, NameSuffix,
>>> CodeInfo,
>>> +                            Ftor);
>>>  }
>>>
>>>  /// \brief Remaps instructions in \p Blocks using the mapping in \p
>>> VMap.
>>>
>>> Modified: llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp?rev=262636&r1=262635&r2=262636&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Utils/InlineFunction.cpp Thu Mar  3
>>> 12:26:33 2016
>>> @@ -1319,7 +1319,7 @@ bool llvm::InlineFunction(CallSite CS, I
>>>
>>>    // If IFI has any state in it, zap it before we fill it in.
>>>    IFI.reset();
>>> -
>>> +
>>>    const Function *CalledFunc = CS.getCalledFunction();
>>>    if (!CalledFunc ||              // Can't inline external function or
>>> indirect
>>>        CalledFunc->isDeclaration() || // call, or call to a vararg
>>> function!
>>> @@ -1486,7 +1486,7 @@ bool llvm::InlineFunction(CallSite CS, I
>>>      // happy with whatever the cloner can do.
>>>      CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,
>>>                                /*ModuleLevelChanges=*/false, Returns,
>>> ".i",
>>> -                              &InlinedFunctionInfo, TheCall);
>>> +                              &InlinedFunctionInfo, TheCall, IFI.Ftor);
>>>
>>>      // Remember the first block that is newly cloned over.
>>>      FirstNewBlock = LastBlock; ++FirstNewBlock;
>>>
>>> Added: llvm/trunk/test/Transforms/Inline/function-count-update-2.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/function-count-update-2.ll?rev=262636&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/Inline/function-count-update-2.ll (added)
>>> +++ llvm/trunk/test/Transforms/Inline/function-count-update-2.ll Thu
>>> Mar  3 12:26:33 2016
>>> @@ -0,0 +1,27 @@
>>> +; RUN: opt < %s -inline -S | FileCheck %s
>>> +
>>> +; This tests that the function count of a callee gets correctly updated
>>> after it
>>> +; has been inlined into a two callsites.
>>> +
>>> +; CHECK: @callee() !prof [[COUNT:![0-9]+]]
>>> +define i32 @callee() !prof !1 {
>>> +  ret i32 0
>>> +}
>>> +
>>> +define i32 @caller1() !prof !2 {
>>> +  %i = call i32 @callee()
>>> +  ret i32 %i
>>> +}
>>> +
>>> +define i32 @caller2() !prof !3 {
>>> +  %i = call i32 @callee()
>>> +  ret i32 %i
>>> +}
>>> +
>>> +!llvm.module.flags = !{!0}
>>> +; CHECK: [[COUNT]] = !{!"function_entry_count", i64 0}
>>> +!0 = !{i32 1, !"MaxFunctionCount", i32 1000}
>>> +!1 = !{!"function_entry_count", i64 1000}
>>> +!2 = !{!"function_entry_count", i64 600}
>>> +!3 = !{!"function_entry_count", i64 400}
>>> +
>>>
>>> Added: llvm/trunk/test/Transforms/Inline/function-count-update-3.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/function-count-update-3.ll?rev=262636&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/Inline/function-count-update-3.ll (added)
>>> +++ llvm/trunk/test/Transforms/Inline/function-count-update-3.ll Thu
>>> Mar  3 12:26:33 2016
>>> @@ -0,0 +1,69 @@
>>> +; RUN: opt < %s -inline -S -inline-threshold=50 | FileCheck %s
>>> +
>>> +; This tests that the function count of a function gets properly scaled
>>> after
>>> +; inlining a call chain leading to the function.
>>> +; Function a calls c with count 200 (C1)
>>> +; Function b calls c with count 300
>>> +; Function c calls e with count 250 (C2)
>>> +; Entry count of e is 500 (C3)
>>> +; c->e inlining does not happen since the cost exceeds threshold.
>>> +; c then inlined into a.
>>> +; e now gets inlined into a (through c) since the branch condition in e
>>> is now
>>> +; known and hence the cost gets reduced.
>>> +; Estimated count of a->e callsite = C2 * (C1 / C3)
>>> +; Estimated count of a->e callsite = 250 * (200 / 500) = 100
>>> +; Remaining count of e = C3 - 100 = 500 - 100 = 400
>>> +
>>> + at data = external global i32
>>> +
>>> +define i32 @a(i32 %a1) !prof !1 {
>>> +  %a2 = call i32 @c(i32 %a1, i32 1)
>>> +  ret i32 %a2
>>> +}
>>> +
>>> +define i32 @b(i32 %b1) !prof !2 {
>>> +  %b2 = call i32 @c(i32 %b1, i32 %b1)
>>> +  ret i32 %b2
>>> +}
>>> +
>>> +define i32 @c(i32 %c1, i32 %c100) !prof !3 {
>>> +  %cond = icmp sle i32 %c1, 1
>>> +  br i1 %cond, label %cond_true, label %cond_false
>>> +
>>> +cond_false:
>>> +  ret i32 0
>>> +
>>> +cond_true:
>>> +  %c11 = call i32 @e(i32 %c100)
>>> +  ret i32 %c11
>>> +}
>>> +
>>> +; CHECK: @e(i32 %c1) !prof [[COUNT:![0-9]+]]
>>> +define i32 @e(i32 %c1) !prof !4 {
>>> +  %cond = icmp sle i32 %c1, 1
>>> +  br i1 %cond, label %cond_true, label %cond_false
>>> +
>>> +cond_false:
>>> +  %c2 = load i32, i32* @data, align 4
>>> +  %c3 = add i32 %c1, %c2
>>> +  %c4 = mul i32 %c3, %c2
>>> +  %c5 = add i32 %c4, %c2
>>> +  %c6 = mul i32 %c5, %c2
>>> +  %c7 = add i32 %c6, %c2
>>> +  %c8 = mul i32 %c7, %c2
>>> +  %c9 = add i32 %c8, %c2
>>> +  %c10 = mul i32 %c9, %c2
>>> +  ret i32 %c10
>>> +
>>> +cond_true:
>>> +  ret i32 0
>>> +}
>>> +
>>> +!llvm.module.flags = !{!0}
>>> +; CHECK: [[COUNT]] = !{!"function_entry_count", i64 400}
>>> +!0 = !{i32 1, !"MaxFunctionCount", i32 5000}
>>> +!1 = !{!"function_entry_count", i64 200}
>>> +!2 = !{!"function_entry_count", i64 300}
>>> +!3 = !{!"function_entry_count", i64 500}
>>> +!4 = !{!"function_entry_count", i64 500}
>>> +
>>>
>>> Added: llvm/trunk/test/Transforms/Inline/function-count-update.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/function-count-update.ll?rev=262636&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/Inline/function-count-update.ll (added)
>>> +++ llvm/trunk/test/Transforms/Inline/function-count-update.ll Thu Mar
>>> 3 12:26:33 2016
>>> @@ -0,0 +1,51 @@
>>> +; RUN: opt < %s -inline -S | FileCheck %s
>>> +; RUN: opt < %s -always-inline -S | FileCheck %s
>>> +
>>> +; This tests that the function count of two callees get correctly
>>> updated after
>>> +; they have been inlined into two back-to-back callsites in a single
>>> basic block
>>> +; in the caller. The callees have the alwaysinline attribute and so
>>> they get
>>> +; inlined both with the regular inliner pass and the always inline
>>> pass. In
>>> +; both cases, the new count of each callee is the original count minus
>>> callsite
>>> +; count which is 200 (since the caller's entry count is 400 and the
>>> block
>>> +; containing the calls have a relative block frequency of 0.5).
>>> +
>>> +; CHECK: @callee1(i32 %n) #0 !prof [[COUNT1:![0-9]+]]
>>> +define i32 @callee1(i32 %n) #0 !prof !1 {
>>> +  %cond = icmp sle i32 %n, 10
>>> +  br i1 %cond, label %cond_true, label %cond_false
>>> +
>>> +cond_true:
>>> +  %r1 = add i32 %n, 1
>>> +  ret i32 %r1
>>> +cond_false:
>>> +  %r2 = add i32 %n, 2
>>> +  ret i32 %r2
>>> +}
>>> +
>>> +; CHECK: @callee2(i32 %n) #0 !prof [[COUNT2:![0-9]+]]
>>> +define i32 @callee2(i32 %n) #0 !prof !2 {
>>> +  %r1 = add i32 %n, 1
>>> +  ret i32 %r1
>>> +}
>>> +
>>> +define i32 @caller(i32 %n) !prof !3 {
>>> +  %cond = icmp sle i32 %n, 100
>>> +  br i1 %cond, label %cond_true, label %cond_false
>>> +
>>> +cond_true:
>>> +  %i = call i32 @callee1(i32 %n)
>>> +  %j = call i32 @callee2(i32 %i)
>>> +  ret i32 %j
>>> +cond_false:
>>> +  ret i32 0
>>> +}
>>> +
>>> +!llvm.module.flags = !{!0}
>>> +; CHECK: [[COUNT1]] = !{!"function_entry_count", i64 800}
>>> +; CHECK: [[COUNT2]] = !{!"function_entry_count", i64 1800}
>>> +!0 = !{i32 1, !"MaxFunctionCount", i32 1000}
>>> +!1 = !{!"function_entry_count", i64 1000}
>>> +!2 = !{!"function_entry_count", i64 2000}
>>> +!3 = !{!"function_entry_count", i64 400}
>>> +attributes #0 = { alwaysinline }
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160303/4097f921/attachment.html>


More information about the llvm-commits mailing list