[llvm-commits] [llvm] r153812 - in /llvm/trunk: include/llvm/Analysis/ include/llvm/Transforms/IPO/ lib/Analysis/ lib/Transforms/IPO/ test/Transforms/Inline/
David Dean
david_dean at apple.com
Tue Apr 10 13:50:35 PDT 2012
Chandler,
we're seeing a 9.92% compile time regression in MultiSource/Applications/sqlite3/sqlite3 on ARMv7 -mthumb -O3. Can you please take a look?
On 31 Mar 2012, at 5:42 AM, Chandler Carruth wrote:
> Author: chandlerc
> Date: Sat Mar 31 07:42:41 2012
> New Revision: 153812
>
> URL: http://llvm.org/viewvc/llvm-project?rev=153812&view=rev
> Log:
> Initial commit for the rewrite of the inline cost analysis to operate
> on a per-callsite walk of the called function's instructions, in
> breadth-first order over the potentially reachable set of basic blocks.
>
> This is a major shift in how inline cost analysis works to improve the
> accuracy and rationality of inlining decisions. A brief outline of the
> algorithm this moves to:
>
> - Build a simplification mapping based on the callsite arguments to the
> function arguments.
> - Push the entry block onto a worklist of potentially-live basic blocks.
> - Pop the first block off of the *front* of the worklist (for
> breadth-first ordering) and walk its instructions using a custom
> InstVisitor.
> - For each instruction's operands, re-map them based on the
> simplification mappings available for the given callsite.
> - Compute any simplification possible of the instruction after
> re-mapping, and store that back int othe simplification mapping.
> - Compute any bonuses, costs, or other impacts of the instruction on the
> cost metric.
> - When the terminator is reached, replace any conditional value in the
> terminator with any simplifications from the mapping we have, and add
> any successors which are not proven to be dead from these
> simplifications to the worklist.
> - Pop the next block off of the front of the worklist, and repeat.
> - As soon as the cost of inlining exceeds the threshold for the
> callsite, stop analyzing the function in order to bound cost.
>
> The primary goal of this algorithm is to perfectly handle dead code
> paths. We do not want any code in trivially dead code paths to impact
> inlining decisions. The previous metric was *extremely* flawed here, and
> would always subtract the average cost of two successors of
> a conditional branch when it was proven to become an unconditional
> branch at the callsite. There was no handling of wildly different costs
> between the two successors, which would cause inlining when the path
> actually taken was too large, and no inlining when the path actually
> taken was trivially simple. There was also no handling of the code
> *path*, only the immediate successors. These problems vanish completely
> now. See the added regression tests for the shiny new features -- we
> skip recursive function calls, SROA-killing instructions, and high cost
> complex CFG structures when dead at the callsite being analyzed.
>
> Switching to this algorithm required refactoring the inline cost
> interface to accept the actual threshold rather than simply returning
> a single cost. The resulting interface is pretty bad, and I'm planning
> to do lots of interface cleanup after this patch.
>
> Several other refactorings fell out of this, but I've tried to minimize
> them for this patch. =/ There is still more cleanup that can be done
> here. Please point out anything that you see in review.
>
> I've worked really hard to try to mirror at least the spirit of all of
> the previous heuristics in the new model. It's not clear that they are
> all correct any more, but I wanted to minimize the change in this single
> patch, it's already a bit ridiculous. One heuristic that is *not* yet
> mirrored is to allow inlining of functions with a dynamic alloca *if*
> the caller has a dynamic alloca. I will add this back, but I think the
> most reasonable way requires changes to the inliner itself rather than
> just the cost metric, and so I've deferred this for a subsequent patch.
> The test case is XFAIL-ed until then.
>
> As mentioned in the review mail, this seems to make Clang run about 1%
> to 2% faster in -O0, but makes its binary size grow by just under 4%.
> I've looked into the 4% growth, and it can be fixed, but requires
> changes to other parts of the inliner.
>
> Modified:
> llvm/trunk/include/llvm/Analysis/CodeMetrics.h
> llvm/trunk/include/llvm/Analysis/InlineCost.h
> llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h
> llvm/trunk/lib/Analysis/CodeMetrics.cpp
> llvm/trunk/lib/Analysis/InlineCost.cpp
> llvm/trunk/lib/Transforms/IPO/InlineAlways.cpp
> llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp
> llvm/trunk/lib/Transforms/IPO/Inliner.cpp
> llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
> llvm/trunk/test/Transforms/Inline/dynamic_alloca_test.ll
> llvm/trunk/test/Transforms/Inline/inline_constprop.ll
> llvm/trunk/test/Transforms/Inline/noinline-recursive-fn.ll
> llvm/trunk/test/Transforms/Inline/ptr-diff.ll
>
> Modified: llvm/trunk/include/llvm/Analysis/CodeMetrics.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/CodeMetrics.h?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Analysis/CodeMetrics.h (original)
> +++ llvm/trunk/include/llvm/Analysis/CodeMetrics.h Sat Mar 31 07:42:41 2012
> @@ -20,9 +20,13 @@
> namespace llvm {
> class BasicBlock;
> class Function;
> + class Instruction;
> class TargetData;
> class Value;
>
> + /// \brief Check whether an instruction is likely to be "free" when lowered.
> + bool isInstructionFree(const Instruction *I, const TargetData *TD = 0);
> +
> /// \brief Check whether a call will lower to something small.
> ///
> /// This tests checks whether calls to this function will lower to something
>
> Modified: llvm/trunk/include/llvm/Analysis/InlineCost.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/InlineCost.h?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Analysis/InlineCost.h (original)
> +++ llvm/trunk/include/llvm/Analysis/InlineCost.h Sat Mar 31 07:42:41 2012
> @@ -16,6 +16,7 @@
>
> #include "llvm/Function.h"
> #include "llvm/ADT/DenseMap.h"
> +#include "llvm/ADT/SmallPtrSet.h"
> #include "llvm/ADT/ValueMap.h"
> #include "llvm/Analysis/CodeMetrics.h"
> #include <cassert>
> @@ -25,162 +26,105 @@
> namespace llvm {
>
> class CallSite;
> - template<class PtrType, unsigned SmallSize>
> - class SmallPtrSet;
> class TargetData;
>
> namespace InlineConstants {
> // Various magic constants used to adjust heuristics.
> const int InstrCost = 5;
> - const int IndirectCallBonus = -100;
> + const int IndirectCallThreshold = 100;
> const int CallPenalty = 25;
> const int LastCallToStaticBonus = -15000;
> const int ColdccPenalty = 2000;
> const int NoreturnPenalty = 10000;
> }
>
> - /// InlineCost - Represent the cost of inlining a function. This
> - /// supports special values for functions which should "always" or
> - /// "never" be inlined. Otherwise, the cost represents a unitless
> - /// amount; smaller values increase the likelihood of the function
> - /// being inlined.
> + /// \brief Represents the cost of inlining a function.
> + ///
> + /// This supports special values for functions which should "always" or
> + /// "never" be inlined. Otherwise, the cost represents a unitless amount;
> + /// smaller values increase the likelihood of the function being inlined.
> + ///
> + /// Objects of this type also provide the adjusted threshold for inlining
> + /// based on the information available for a particular callsite. They can be
> + /// directly tested to determine if inlining should occur given the cost and
> + /// threshold for this cost metric.
> class InlineCost {
> - enum Kind {
> - Value,
> - Always,
> - Never
> + enum CostKind {
> + CK_Variable,
> + CK_Always,
> + CK_Never
> };
>
> - // This is a do-it-yourself implementation of
> - // int Cost : 30;
> - // unsigned Type : 2;
> - // We used to use bitfields, but they were sometimes miscompiled (PR3822).
> - enum { TYPE_BITS = 2 };
> - enum { COST_BITS = unsigned(sizeof(unsigned)) * CHAR_BIT - TYPE_BITS };
> - unsigned TypedCost; // int Cost : COST_BITS; unsigned Type : TYPE_BITS;
> + const int Cost : 30; // The inlining cost if neither always nor never.
> + const unsigned Kind : 2; // The type of cost, one of CostKind above.
>
> - Kind getType() const {
> - return Kind(TypedCost >> COST_BITS);
> - }
> + /// \brief The adjusted threshold against which this cost should be tested.
> + const int Threshold;
>
> - int getCost() const {
> - // Sign-extend the bottom COST_BITS bits.
> - return (int(TypedCost << TYPE_BITS)) >> TYPE_BITS;
> + // Trivial constructor, interesting logic in the factory functions below.
> + InlineCost(int Cost, CostKind Kind, int Threshold)
> + : Cost(Cost), Kind(Kind), Threshold(Threshold) {}
> +
> + public:
> + static InlineCost get(int Cost, int Threshold) {
> + InlineCost Result(Cost, CK_Variable, Threshold);
> + assert(Result.Cost == Cost && "Cost exceeds InlineCost precision");
> + return Result;
> + }
> + static InlineCost getAlways() {
> + return InlineCost(0, CK_Always, 0);
> + }
> + static InlineCost getNever() {
> + return InlineCost(0, CK_Never, 0);
> }
>
> - InlineCost(int C, int T) {
> - TypedCost = (unsigned(C << TYPE_BITS) >> TYPE_BITS) | (T << COST_BITS);
> - assert(getCost() == C && "Cost exceeds InlineCost precision");
> + /// \brief Test whether the inline cost is low enough for inlining.
> + operator bool() const {
> + if (isAlways()) return true;
> + if (isNever()) return false;
> + return Cost < Threshold;
> }
> - public:
> - static InlineCost get(int Cost) { return InlineCost(Cost, Value); }
> - static InlineCost getAlways() { return InlineCost(0, Always); }
> - static InlineCost getNever() { return InlineCost(0, Never); }
> -
> - bool isVariable() const { return getType() == Value; }
> - bool isAlways() const { return getType() == Always; }
> - bool isNever() const { return getType() == Never; }
>
> - /// getValue() - Return a "variable" inline cost's amount. It is
> + bool isVariable() const { return Kind == CK_Variable; }
> + bool isAlways() const { return Kind == CK_Always; }
> + bool isNever() const { return Kind == CK_Never; }
> +
> + /// getCost() - Return a "variable" inline cost's amount. It is
> /// an error to call this on an "always" or "never" InlineCost.
> - int getValue() const {
> - assert(getType() == Value && "Invalid access of InlineCost");
> - return getCost();
> + int getCost() const {
> + assert(Kind == CK_Variable && "Invalid access of InlineCost");
> + return Cost;
> + }
> +
> + /// \brief Get the cost delta from the threshold for inlining.
> + /// Only valid if the cost is of the variable kind. Returns a negative
> + /// value if the cost is too high to inline.
> + int getCostDelta() const {
> + return Threshold - getCost();
> }
> };
>
> /// InlineCostAnalyzer - Cost analyzer used by inliner.
> class InlineCostAnalyzer {
> - struct ArgInfo {
> - public:
> - unsigned ConstantWeight;
> - unsigned AllocaWeight;
> -
> - ArgInfo(unsigned CWeight, unsigned AWeight)
> - : ConstantWeight(CWeight), AllocaWeight(AWeight)
> - {}
> - };
> -
> - struct FunctionInfo {
> - CodeMetrics Metrics;
> -
> - /// ArgumentWeights - Each formal argument of the function is inspected to
> - /// see if it is used in any contexts where making it a constant or alloca
> - /// would reduce the code size. If so, we add some value to the argument
> - /// entry here.
> - std::vector<ArgInfo> ArgumentWeights;
> -
> - /// PointerArgPairWeights - Weights to use when giving an inline bonus to
> - /// a call site due to correlated pairs of pointers.
> - DenseMap<std::pair<unsigned, unsigned>, unsigned> PointerArgPairWeights;
> -
> - /// countCodeReductionForConstant - Figure out an approximation for how
> - /// many instructions will be constant folded if the specified value is
> - /// constant.
> - unsigned countCodeReductionForConstant(const CodeMetrics &Metrics,
> - Value *V);
> -
> - /// countCodeReductionForAlloca - Figure out an approximation of how much
> - /// smaller the function will be if it is inlined into a context where an
> - /// argument becomes an alloca.
> - unsigned countCodeReductionForAlloca(const CodeMetrics &Metrics,
> - Value *V);
> -
> - /// countCodeReductionForPointerPair - Count the bonus to apply to an
> - /// inline call site where a pair of arguments are pointers and one
> - /// argument is a constant offset from the other. The idea is to
> - /// recognize a common C++ idiom where a begin and end iterator are
> - /// actually pointers, and many operations on the pair of them will be
> - /// constants if the function is called with arguments that have
> - /// a constant offset.
> - void countCodeReductionForPointerPair(
> - const CodeMetrics &Metrics,
> - DenseMap<Value *, unsigned> &PointerArgs,
> - Value *V, unsigned ArgIdx);
> -
> - /// analyzeFunction - Add information about the specified function
> - /// to the current structure.
> - void analyzeFunction(Function *F, const TargetData *TD);
> -
> - /// NeverInline - Returns true if the function should never be
> - /// inlined into any caller.
> - bool NeverInline();
> - };
> -
> - // The Function* for a function can be changed (by ArgumentPromotion);
> - // the ValueMap will update itself when this happens.
> - ValueMap<const Function *, FunctionInfo> CachedFunctionInfo;
> -
> // TargetData if available, or null.
> const TargetData *TD;
>
> - int CountBonusForConstant(Value *V, Constant *C = NULL);
> - int ConstantFunctionBonus(CallSite CS, Constant *C);
> - int getInlineSize(CallSite CS, Function *Callee);
> - int getInlineBonuses(CallSite CS, Function *Callee);
> public:
> InlineCostAnalyzer(): TD(0) {}
>
> void setTargetData(const TargetData *TData) { TD = TData; }
>
> - /// getInlineCost - The heuristic used to determine if we should inline the
> - /// function call or not.
> + /// \brief Get an InlineCost object representing the cost of inlining this
> + /// callsite.
> ///
> - InlineCost getInlineCost(CallSite CS);
> - /// getCalledFunction - The heuristic used to determine if we should inline
> - /// the function call or not. The callee is explicitly specified, to allow
> - /// you to calculate the cost of inlining a function via a pointer. The
> - /// result assumes that the inlined version will always be used. You should
> - /// weight it yourself in cases where this callee will not always be called.
> - InlineCost getInlineCost(CallSite CS, Function *Callee);
> -
> - /// getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
> - /// higher threshold to determine if the function call should be inlined.
> - float getInlineFudgeFactor(CallSite CS);
> + /// Note that threshold is passed into this function. Only costs below the
> + /// threshold are computed with any accuracy. The threshold can be used to
> + /// bound the computation necessary to determine whether the cost is
> + /// sufficiently low to warrant inlining.
> + InlineCost getInlineCost(CallSite CS, int Threshold);
>
> /// resetCachedFunctionInfo - erase any cached cost info for this function.
> void resetCachedCostInfo(Function* Caller) {
> - CachedFunctionInfo[Caller] = FunctionInfo();
> }
>
> /// growCachedCostInfo - update the cached cost info for Caller after Callee
>
> Modified: llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h (original)
> +++ llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h Sat Mar 31 07:42:41 2012
> @@ -65,11 +65,6 @@
> ///
> virtual InlineCost getInlineCost(CallSite CS) = 0;
>
> - // getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
> - // higher threshold to determine if the function call should be inlined.
> - ///
> - virtual float getInlineFudgeFactor(CallSite CS) = 0;
> -
> /// resetCachedCostInfo - erase any cached cost data from the derived class.
> /// If the derived class has no such data this can be empty.
> ///
>
> Modified: llvm/trunk/lib/Analysis/CodeMetrics.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/CodeMetrics.cpp?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Analysis/CodeMetrics.cpp (original)
> +++ llvm/trunk/lib/Analysis/CodeMetrics.cpp Sat Mar 31 07:42:41 2012
> @@ -50,6 +50,52 @@
> return false;
> }
>
> +bool llvm::isInstructionFree(const Instruction *I, const TargetData *TD) {
> + if (isa<PHINode>(I))
> + return true;
> +
> + // If a GEP has all constant indices, it will probably be folded with
> + // a load/store.
> + if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
> + return GEP->hasAllConstantIndices();
> +
> + if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
> + switch (II->getIntrinsicID()) {
> + default:
> + return false;
> + case Intrinsic::dbg_declare:
> + case Intrinsic::dbg_value:
> + case Intrinsic::invariant_start:
> + case Intrinsic::invariant_end:
> + case Intrinsic::lifetime_start:
> + case Intrinsic::lifetime_end:
> + case Intrinsic::objectsize:
> + case Intrinsic::ptr_annotation:
> + case Intrinsic::var_annotation:
> + // These intrinsics don't count as size.
> + return true;
> + }
> + }
> +
> + if (const CastInst *CI = dyn_cast<CastInst>(I)) {
> + // Noop casts, including ptr <-> int, don't count.
> + if (CI->isLosslessCast() || isa<IntToPtrInst>(CI) || isa<PtrToIntInst>(CI))
> + return true;
> + // trunc to a native type is free (assuming the target has compare and
> + // shift-right of the same width).
> + if (TD && isa<TruncInst>(CI) &&
> + TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
> + return true;
> + // Result of a cmp instruction is often extended (to be used by other
> + // cmp instructions, logical or return instructions). These are usually
> + // nop on most sane targets.
> + if (isa<CmpInst>(CI->getOperand(0)))
> + return true;
> + }
> +
> + return false;
> +}
> +
> /// analyzeBasicBlock - Fill in the current structure with information gleaned
> /// from the specified block.
> void CodeMetrics::analyzeBasicBlock(const BasicBlock *BB,
> @@ -58,27 +104,11 @@
> unsigned NumInstsBeforeThisBB = NumInsts;
> for (BasicBlock::const_iterator II = BB->begin(), E = BB->end();
> II != E; ++II) {
> - if (isa<PHINode>(II)) continue; // PHI nodes don't count.
> + if (isInstructionFree(II, TD))
> + continue;
>
> // Special handling for calls.
> if (isa<CallInst>(II) || isa<InvokeInst>(II)) {
> - if (const IntrinsicInst *IntrinsicI = dyn_cast<IntrinsicInst>(II)) {
> - switch (IntrinsicI->getIntrinsicID()) {
> - default: break;
> - case Intrinsic::dbg_declare:
> - case Intrinsic::dbg_value:
> - case Intrinsic::invariant_start:
> - case Intrinsic::invariant_end:
> - case Intrinsic::lifetime_start:
> - case Intrinsic::lifetime_end:
> - case Intrinsic::objectsize:
> - case Intrinsic::ptr_annotation:
> - case Intrinsic::var_annotation:
> - // These intrinsics don't count as size.
> - continue;
> - }
> - }
> -
> ImmutableCallSite CS(cast<Instruction>(II));
>
> if (const Function *F = CS.getCalledFunction()) {
> @@ -115,28 +145,6 @@
> if (isa<ExtractElementInst>(II) || II->getType()->isVectorTy())
> ++NumVectorInsts;
>
> - if (const CastInst *CI = dyn_cast<CastInst>(II)) {
> - // Noop casts, including ptr <-> int, don't count.
> - if (CI->isLosslessCast() || isa<IntToPtrInst>(CI) ||
> - isa<PtrToIntInst>(CI))
> - continue;
> - // trunc to a native type is free (assuming the target has compare and
> - // shift-right of the same width).
> - if (isa<TruncInst>(CI) && TD &&
> - TD->isLegalInteger(TD->getTypeSizeInBits(CI->getType())))
> - continue;
> - // Result of a cmp instruction is often extended (to be used by other
> - // cmp instructions, logical or return instructions). These are usually
> - // nop on most sane targets.
> - if (isa<CmpInst>(CI->getOperand(0)))
> - continue;
> - } else if (const GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(II)){
> - // If a GEP has all constant indices, it will probably be folded with
> - // a load/store.
> - if (GEPI->hasAllConstantIndices())
> - continue;
> - }
> -
> ++NumInsts;
> }
>
>
> Modified: llvm/trunk/lib/Analysis/InlineCost.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InlineCost.cpp?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Analysis/InlineCost.cpp (original)
> +++ llvm/trunk/lib/Analysis/InlineCost.cpp Sat Mar 31 07:42:41 2012
> @@ -11,659 +11,1014 @@
> //
> //===----------------------------------------------------------------------===//
>
> +#define DEBUG_TYPE "inline-cost"
> #include "llvm/Analysis/InlineCost.h"
> +#include "llvm/Analysis/ConstantFolding.h"
> +#include "llvm/Analysis/InstructionSimplify.h"
> #include "llvm/Support/CallSite.h"
> +#include "llvm/Support/Debug.h"
> +#include "llvm/Support/InstVisitor.h"
> +#include "llvm/Support/GetElementPtrTypeIterator.h"
> +#include "llvm/Support/raw_ostream.h"
> #include "llvm/CallingConv.h"
> #include "llvm/IntrinsicInst.h"
> +#include "llvm/Operator.h"
> +#include "llvm/GlobalAlias.h"
> #include "llvm/Target/TargetData.h"
> +#include "llvm/ADT/STLExtras.h"
> +#include "llvm/ADT/SetVector.h"
> +#include "llvm/ADT/SmallVector.h"
> #include "llvm/ADT/SmallPtrSet.h"
>
> using namespace llvm;
>
> -unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForConstant(
> - const CodeMetrics &Metrics, Value *V) {
> - unsigned Reduction = 0;
> - SmallVector<Value *, 4> Worklist;
> - Worklist.push_back(V);
> - do {
> - Value *V = Worklist.pop_back_val();
> - for (Value::use_iterator UI = V->use_begin(), E = V->use_end(); UI != E;++UI){
> - User *U = *UI;
> - if (isa<BranchInst>(U) || isa<SwitchInst>(U)) {
> - // We will be able to eliminate all but one of the successors.
> - const TerminatorInst &TI = cast<TerminatorInst>(*U);
> - const unsigned NumSucc = TI.getNumSuccessors();
> - unsigned Instrs = 0;
> - for (unsigned I = 0; I != NumSucc; ++I)
> - Instrs += Metrics.NumBBInsts.lookup(TI.getSuccessor(I));
> - // We don't know which blocks will be eliminated, so use the average size.
> - Reduction += InlineConstants::InstrCost*Instrs*(NumSucc-1)/NumSucc;
> - continue;
> +namespace {
> +
> +class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
> + typedef InstVisitor<CallAnalyzer, bool> Base;
> + friend class InstVisitor<CallAnalyzer, bool>;
> +
> + // TargetData if available, or null.
> + const TargetData *const TD;
> +
> + // The called function.
> + Function &F;
> +
> + int Threshold;
> + int Cost;
> + const bool AlwaysInline;
> +
> + bool IsRecursive;
> + bool ExposesReturnsTwice;
> + bool HasDynamicAlloca;
> + unsigned NumInstructions, NumVectorInstructions;
> + int FiftyPercentVectorBonus, TenPercentVectorBonus;
> + int VectorBonus;
> +
> + // While we walk the potentially-inlined instructions, we build up and
> + // maintain a mapping of simplified values specific to this callsite. The
> + // idea is to propagate any special information we have about arguments to
> + // this call through the inlinable section of the function, and account for
> + // likely simplifications post-inlining. The most important aspect we track
> + // is CFG altering simplifications -- when we prove a basic block dead, that
> + // can cause dramatic shifts in the cost of inlining a function.
> + DenseMap<Value *, Constant *> SimplifiedValues;
> +
> + // Keep track of the values which map back (through function arguments) to
> + // allocas on the caller stack which could be simplified through SROA.
> + DenseMap<Value *, Value *> SROAArgValues;
> +
> + // The mapping of caller Alloca values to their accumulated cost savings. If
> + // we have to disable SROA for one of the allocas, this tells us how much
> + // cost must be added.
> + DenseMap<Value *, int> SROAArgCosts;
> +
> + // Keep track of values which map to a pointer base and constant offset.
> + DenseMap<Value *, std::pair<Value *, APInt> > ConstantOffsetPtrs;
> +
> + // Custom simplification helper routines.
> + bool isAllocaDerivedArg(Value *V);
> + bool lookupSROAArgAndCost(Value *V, Value *&Arg,
> + DenseMap<Value *, int>::iterator &CostIt);
> + void disableSROA(DenseMap<Value *, int>::iterator CostIt);
> + void disableSROA(Value *V);
> + void accumulateSROACost(DenseMap<Value *, int>::iterator CostIt,
> + int InstructionCost);
> + bool handleSROACandidate(bool IsSROAValid,
> + DenseMap<Value *, int>::iterator CostIt,
> + int InstructionCost);
> + bool isGEPOffsetConstant(GetElementPtrInst &GEP);
> + bool accumulateGEPOffset(GEPOperator &GEP, APInt &Offset);
> + ConstantInt *stripAndComputeInBoundsConstantOffsets(Value *&V);
> +
> + // Custom analysis routines.
> + bool analyzeBlock(BasicBlock *BB);
> +
> + // Disable several entry points to the visitor so we don't accidentally use
> + // them by declaring but not defining them here.
> + void visit(Module *); void visit(Module &);
> + void visit(Function *); void visit(Function &);
> + void visit(BasicBlock *); void visit(BasicBlock &);
> +
> + // Provide base case for our instruction visit.
> + bool visitInstruction(Instruction &I);
> +
> + // Our visit overrides.
> + bool visitAlloca(AllocaInst &I);
> + bool visitPHI(PHINode &I);
> + bool visitGetElementPtr(GetElementPtrInst &I);
> + bool visitBitCast(BitCastInst &I);
> + bool visitPtrToInt(PtrToIntInst &I);
> + bool visitIntToPtr(IntToPtrInst &I);
> + bool visitCastInst(CastInst &I);
> + bool visitUnaryInstruction(UnaryInstruction &I);
> + bool visitICmp(ICmpInst &I);
> + bool visitSub(BinaryOperator &I);
> + bool visitBinaryOperator(BinaryOperator &I);
> + bool visitLoad(LoadInst &I);
> + bool visitStore(StoreInst &I);
> + bool visitCallSite(CallSite CS);
> +
> +public:
> + CallAnalyzer(const TargetData *TD, Function &Callee, int Threshold)
> + : TD(TD), F(Callee), Threshold(Threshold), Cost(0),
> + AlwaysInline(F.hasFnAttr(Attribute::AlwaysInline)),
> + IsRecursive(false), ExposesReturnsTwice(false), HasDynamicAlloca(false),
> + NumInstructions(0), NumVectorInstructions(0),
> + FiftyPercentVectorBonus(0), TenPercentVectorBonus(0), VectorBonus(0),
> + NumConstantArgs(0), NumConstantOffsetPtrArgs(0), NumAllocaArgs(0),
> + NumConstantPtrCmps(0), NumConstantPtrDiffs(0),
> + NumInstructionsSimplified(0), SROACostSavings(0), SROACostSavingsLost(0) {
> + }
> +
> + bool analyzeCall(CallSite CS);
> +
> + int getThreshold() { return Threshold; }
> + int getCost() { return Cost; }
> +
> + // Keep a bunch of stats about the cost savings found so we can print them
> + // out when debugging.
> + unsigned NumConstantArgs;
> + unsigned NumConstantOffsetPtrArgs;
> + unsigned NumAllocaArgs;
> + unsigned NumConstantPtrCmps;
> + unsigned NumConstantPtrDiffs;
> + unsigned NumInstructionsSimplified;
> + unsigned SROACostSavings;
> + unsigned SROACostSavingsLost;
> +
> + void dump();
> +};
> +
> +} // namespace
> +
> +/// \brief Test whether the given value is an Alloca-derived function argument.
> +bool CallAnalyzer::isAllocaDerivedArg(Value *V) {
> + return SROAArgValues.count(V);
> +}
> +
> +/// \brief Lookup the SROA-candidate argument and cost iterator which V maps to.
> +/// Returns false if V does not map to a SROA-candidate.
> +bool CallAnalyzer::lookupSROAArgAndCost(
> + Value *V, Value *&Arg, DenseMap<Value *, int>::iterator &CostIt) {
> + if (SROAArgValues.empty() || SROAArgCosts.empty())
> + return false;
> +
> + DenseMap<Value *, Value *>::iterator ArgIt = SROAArgValues.find(V);
> + if (ArgIt == SROAArgValues.end())
> + return false;
> +
> + Arg = ArgIt->second;
> + CostIt = SROAArgCosts.find(Arg);
> + return CostIt != SROAArgCosts.end();
> +}
> +
> +/// \brief Disable SROA for the candidate marked by this cost iterator.
> +///
> +/// This markes the candidate as no longer viable for SROA, and adds the cost
> +/// savings associated with it back into the inline cost measurement.
> +void CallAnalyzer::disableSROA(DenseMap<Value *, int>::iterator CostIt) {
> + // If we're no longer able to perform SROA we need to undo its cost savings
> + // and prevent subsequent analysis.
> + Cost += CostIt->second;
> + SROACostSavings -= CostIt->second;
> + SROACostSavingsLost += CostIt->second;
> + SROAArgCosts.erase(CostIt);
> +}
> +
> +/// \brief If 'V' maps to a SROA candidate, disable SROA for it.
> +void CallAnalyzer::disableSROA(Value *V) {
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(V, SROAArg, CostIt))
> + disableSROA(CostIt);
> +}
> +
> +/// \brief Accumulate the given cost for a particular SROA candidate.
> +void CallAnalyzer::accumulateSROACost(DenseMap<Value *, int>::iterator CostIt,
> + int InstructionCost) {
> + CostIt->second += InstructionCost;
> + SROACostSavings += InstructionCost;
> +}
> +
> +/// \brief Helper for the common pattern of handling a SROA candidate.
> +/// Either accumulates the cost savings if the SROA remains valid, or disables
> +/// SROA for the candidate.
> +bool CallAnalyzer::handleSROACandidate(bool IsSROAValid,
> + DenseMap<Value *, int>::iterator CostIt,
> + int InstructionCost) {
> + if (IsSROAValid) {
> + accumulateSROACost(CostIt, InstructionCost);
> + return true;
> + }
> +
> + disableSROA(CostIt);
> + return false;
> +}
> +
> +/// \brief Check whether a GEP's indices are all constant.
> +///
> +/// Respects any simplified values known during the analysis of this callsite.
> +bool CallAnalyzer::isGEPOffsetConstant(GetElementPtrInst &GEP) {
> + for (User::op_iterator I = GEP.idx_begin(), E = GEP.idx_end(); I != E; ++I)
> + if (!isa<Constant>(*I) && !SimplifiedValues.lookup(*I))
> + return false;
> +
> + return true;
> +}
> +
> +/// \brief Accumulate a constant GEP offset into an APInt if possible.
> +///
> +/// Returns false if unable to compute the offset for any reason. Respects any
> +/// simplified values known during the analysis of this callsite.
> +bool CallAnalyzer::accumulateGEPOffset(GEPOperator &GEP, APInt &Offset) {
> + if (!TD)
> + return false;
> +
> + unsigned IntPtrWidth = TD->getPointerSizeInBits();
> + assert(IntPtrWidth == Offset.getBitWidth());
> +
> + for (gep_type_iterator GTI = gep_type_begin(GEP), GTE = gep_type_end(GEP);
> + GTI != GTE; ++GTI) {
> + ConstantInt *OpC = dyn_cast<ConstantInt>(GTI.getOperand());
> + if (!OpC)
> + if (Constant *SimpleOp = SimplifiedValues.lookup(GTI.getOperand()))
> + OpC = dyn_cast<ConstantInt>(SimpleOp);
> + if (!OpC)
> + return false;
> + if (OpC->isZero()) continue;
> +
> + // Handle a struct index, which adds its field offset to the pointer.
> + if (StructType *STy = dyn_cast<StructType>(*GTI)) {
> + unsigned ElementIdx = OpC->getZExtValue();
> + const StructLayout *SL = TD->getStructLayout(STy);
> + Offset += APInt(IntPtrWidth, SL->getElementOffset(ElementIdx));
> + continue;
> + }
> +
> + APInt TypeSize(IntPtrWidth, TD->getTypeAllocSize(GTI.getIndexedType()));
> + Offset += OpC->getValue().sextOrTrunc(IntPtrWidth) * TypeSize;
> + }
> + return true;
> +}
> +
> +bool CallAnalyzer::visitAlloca(AllocaInst &I) {
> + // FIXME: Check whether inlining will turn a dynamic alloca into a static
> + // alloca, and handle that case.
> +
> + // We will happily inline tatic alloca instructions or dynamic alloca
> + // instructions in always-inline situations.
> + if (AlwaysInline || I.isStaticAlloca())
> + return Base::visitAlloca(I);
> +
> + // FIXME: This is overly conservative. Dynamic allocas are inefficient for
> + // a variety of reasons, and so we would like to not inline them into
> + // functions which don't currently have a dynamic alloca. This simply
> + // disables inlining altogether in the presence of a dynamic alloca.
> + HasDynamicAlloca = true;
> + return false;
> +}
> +
> +bool CallAnalyzer::visitPHI(PHINode &I) {
> + // FIXME: We should potentially be tracking values through phi nodes,
> + // especially when they collapse to a single value due to deleted CFG edges
> + // during inlining.
> +
> + // FIXME: We need to propagate SROA *disabling* through phi nodes, even
> + // though we don't want to propagate it's bonuses. The idea is to disable
> + // SROA if it *might* be used in an inappropriate manner.
> +
> + // Phi nodes are always zero-cost.
> + return true;
> +}
> +
> +bool CallAnalyzer::visitGetElementPtr(GetElementPtrInst &I) {
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + bool SROACandidate = lookupSROAArgAndCost(I.getPointerOperand(),
> + SROAArg, CostIt);
> +
> + // Try to fold GEPs of constant-offset call site argument pointers. This
> + // requires target data and inbounds GEPs.
> + if (TD && I.isInBounds()) {
> + // Check if we have a base + offset for the pointer.
> + Value *Ptr = I.getPointerOperand();
> + std::pair<Value *, APInt> BaseAndOffset = ConstantOffsetPtrs.lookup(Ptr);
> + if (BaseAndOffset.first) {
> + // Check if the offset of this GEP is constant, and if so accumulate it
> + // into Offset.
> + if (!accumulateGEPOffset(cast<GEPOperator>(I), BaseAndOffset.second)) {
> + // Non-constant GEPs aren't folded, and disable SROA.
> + if (SROACandidate)
> + disableSROA(CostIt);
> + return false;
> }
>
> - // Figure out if this instruction will be removed due to simple constant
> - // propagation.
> - Instruction &Inst = cast<Instruction>(*U);
> -
> - // We can't constant propagate instructions which have effects or
> - // read memory.
> - //
> - // FIXME: It would be nice to capture the fact that a load from a
> - // pointer-to-constant-global is actually a *really* good thing to zap.
> - // Unfortunately, we don't know the pointer that may get propagated here,
> - // so we can't make this decision.
> - if (Inst.mayReadFromMemory() || Inst.mayHaveSideEffects() ||
> - isa<AllocaInst>(Inst))
> - continue;
> + // Add the result as a new mapping to Base + Offset.
> + ConstantOffsetPtrs[&I] = BaseAndOffset;
>
> - bool AllOperandsConstant = true;
> - for (unsigned i = 0, e = Inst.getNumOperands(); i != e; ++i)
> - if (!isa<Constant>(Inst.getOperand(i)) && Inst.getOperand(i) != V) {
> - AllOperandsConstant = false;
> - break;
> - }
> - if (!AllOperandsConstant)
> - continue;
> + // Also handle SROA candidates here, we already know that the GEP is
> + // all-constant indexed.
> + if (SROACandidate)
> + SROAArgValues[&I] = SROAArg;
>
> - // We will get to remove this instruction...
> - Reduction += InlineConstants::InstrCost;
> + return true;
> + }
> + }
> +
> + if (isGEPOffsetConstant(I)) {
> + if (SROACandidate)
> + SROAArgValues[&I] = SROAArg;
> +
> + // Constant GEPs are modeled as free.
> + return true;
> + }
> +
> + // Variable GEPs will require math and will disable SROA.
> + if (SROACandidate)
> + disableSROA(CostIt);
> + return false;
> +}
>
> - // And any other instructions that use it which become constants
> - // themselves.
> - Worklist.push_back(&Inst);
> +bool CallAnalyzer::visitBitCast(BitCastInst &I) {
> + // Propagate constants through bitcasts.
> + if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
> + if (Constant *C = ConstantExpr::getBitCast(COp, I.getType())) {
> + SimplifiedValues[&I] = C;
> + return true;
> + }
> +
> + // Track base/offsets through casts
> + std::pair<Value *, APInt> BaseAndOffset
> + = ConstantOffsetPtrs.lookup(I.getOperand(0));
> + // Casts don't change the offset, just wrap it up.
> + if (BaseAndOffset.first)
> + ConstantOffsetPtrs[&I] = BaseAndOffset;
> +
> + // Also look for SROA candidates here.
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
> + SROAArgValues[&I] = SROAArg;
> +
> + // Bitcasts are always zero cost.
> + return true;
> +}
> +
> +bool CallAnalyzer::visitPtrToInt(PtrToIntInst &I) {
> + // Propagate constants through ptrtoint.
> + if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
> + if (Constant *C = ConstantExpr::getPtrToInt(COp, I.getType())) {
> + SimplifiedValues[&I] = C;
> + return true;
> }
> - } while (!Worklist.empty());
> - return Reduction;
> +
> + // Track base/offset pairs when converted to a plain integer provided the
> + // integer is large enough to represent the pointer.
> + unsigned IntegerSize = I.getType()->getScalarSizeInBits();
> + if (TD && IntegerSize >= TD->getPointerSizeInBits()) {
> + std::pair<Value *, APInt> BaseAndOffset
> + = ConstantOffsetPtrs.lookup(I.getOperand(0));
> + if (BaseAndOffset.first)
> + ConstantOffsetPtrs[&I] = BaseAndOffset;
> + }
> +
> + // This is really weird. Technically, ptrtoint will disable SROA. However,
> + // unless that ptrtoint is *used* somewhere in the live basic blocks after
> + // inlining, it will be nuked, and SROA should proceed. All of the uses which
> + // would block SROA would also block SROA if applied directly to a pointer,
> + // and so we can just add the integer in here. The only places where SROA is
> + // preserved either cannot fire on an integer, or won't in-and-of themselves
> + // disable SROA (ext) w/o some later use that we would see and disable.
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt))
> + SROAArgValues[&I] = SROAArg;
> +
> + // A ptrtoint cast is free so long as the result is large enough to store the
> + // pointer, and a legal integer type.
> + return TD && TD->isLegalInteger(IntegerSize) &&
> + IntegerSize >= TD->getPointerSizeInBits();
> +}
> +
> +bool CallAnalyzer::visitIntToPtr(IntToPtrInst &I) {
> + // Propagate constants through ptrtoint.
> + if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
> + if (Constant *C = ConstantExpr::getIntToPtr(COp, I.getType())) {
> + SimplifiedValues[&I] = C;
> + return true;
> + }
> +
> + // Track base/offset pairs when round-tripped through a pointer without
> + // modifications provided the integer is not too large.
> + Value *Op = I.getOperand(0);
> + unsigned IntegerSize = Op->getType()->getScalarSizeInBits();
> + if (TD && IntegerSize <= TD->getPointerSizeInBits()) {
> + std::pair<Value *, APInt> BaseAndOffset = ConstantOffsetPtrs.lookup(Op);
> + if (BaseAndOffset.first)
> + ConstantOffsetPtrs[&I] = BaseAndOffset;
> + }
> +
> + // "Propagate" SROA here in the same manner as we do for ptrtoint above.
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(Op, SROAArg, CostIt))
> + SROAArgValues[&I] = SROAArg;
> +
> + // An inttoptr cast is free so long as the input is a legal integer type
> + // which doesn't contain values outside the range of a pointer.
> + return TD && TD->isLegalInteger(IntegerSize) &&
> + IntegerSize <= TD->getPointerSizeInBits();
> +}
> +
> +bool CallAnalyzer::visitCastInst(CastInst &I) {
> + // Propagate constants through ptrtoint.
> + if (Constant *COp = dyn_cast<Constant>(I.getOperand(0)))
> + if (Constant *C = ConstantExpr::getCast(I.getOpcode(), COp, I.getType())) {
> + SimplifiedValues[&I] = C;
> + return true;
> + }
> +
> + // Disable SROA in the face of arbitrary casts we don't whitelist elsewhere.
> + disableSROA(I.getOperand(0));
> +
> + // No-op casts don't have any cost.
> + if (I.isLosslessCast())
> + return true;
> +
> + // trunc to a native type is free (assuming the target has compare and
> + // shift-right of the same width).
> + if (TD && isa<TruncInst>(I) &&
> + TD->isLegalInteger(TD->getTypeSizeInBits(I.getType())))
> + return true;
> +
> + // Result of a cmp instruction is often extended (to be used by other
> + // cmp instructions, logical or return instructions). These are usually
> + // no-ops on most sane targets.
> + if (isa<CmpInst>(I.getOperand(0)))
> + return true;
> +
> + // Assume the rest of the casts require work.
> + return false;
> }
>
> -static unsigned countCodeReductionForAllocaICmp(const CodeMetrics &Metrics,
> - ICmpInst *ICI) {
> - unsigned Reduction = 0;
> +bool CallAnalyzer::visitUnaryInstruction(UnaryInstruction &I) {
> + Value *Operand = I.getOperand(0);
> + Constant *Ops[1] = { dyn_cast<Constant>(Operand) };
> + if (Ops[0] || (Ops[0] = SimplifiedValues.lookup(Operand)))
> + if (Constant *C = ConstantFoldInstOperands(I.getOpcode(), I.getType(),
> + Ops, TD)) {
> + SimplifiedValues[&I] = C;
> + return true;
> + }
>
> - // Bail if this is comparing against a non-constant; there is nothing we can
> - // do there.
> - if (!isa<Constant>(ICI->getOperand(1)))
> - return Reduction;
> + // Disable any SROA on the argument to arbitrary unary operators.
> + disableSROA(Operand);
>
> - // An icmp pred (alloca, C) becomes true if the predicate is true when
> - // equal and false otherwise.
> - bool Result = ICI->isTrueWhenEqual();
> + return false;
> +}
>
> - SmallVector<Instruction *, 4> Worklist;
> - Worklist.push_back(ICI);
> - do {
> - Instruction *U = Worklist.pop_back_val();
> - Reduction += InlineConstants::InstrCost;
> - for (Value::use_iterator UI = U->use_begin(), UE = U->use_end();
> - UI != UE; ++UI) {
> - Instruction *I = dyn_cast<Instruction>(*UI);
> - if (!I || I->mayHaveSideEffects()) continue;
> - if (I->getNumOperands() == 1)
> - Worklist.push_back(I);
> - if (BinaryOperator *BO = dyn_cast<BinaryOperator>(I)) {
> - // If BO produces the same value as U, then the other operand is
> - // irrelevant and we can put it into the Worklist to continue
> - // deleting dead instructions. If BO produces the same value as the
> - // other operand, we can delete BO but that's it.
> - if (Result == true) {
> - if (BO->getOpcode() == Instruction::Or)
> - Worklist.push_back(I);
> - if (BO->getOpcode() == Instruction::And)
> - Reduction += InlineConstants::InstrCost;
> - } else {
> - if (BO->getOpcode() == Instruction::Or ||
> - BO->getOpcode() == Instruction::Xor)
> - Reduction += InlineConstants::InstrCost;
> - if (BO->getOpcode() == Instruction::And)
> - Worklist.push_back(I);
> - }
> +bool CallAnalyzer::visitICmp(ICmpInst &I) {
> + Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
> + // First try to handle simplified comparisons.
> + if (!isa<Constant>(LHS))
> + if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
> + LHS = SimpleLHS;
> + if (!isa<Constant>(RHS))
> + if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
> + RHS = SimpleRHS;
> + if (Constant *CLHS = dyn_cast<Constant>(LHS))
> + if (Constant *CRHS = dyn_cast<Constant>(RHS))
> + if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
> + SimplifiedValues[&I] = C;
> + return true;
> }
> - if (BranchInst *BI = dyn_cast<BranchInst>(I)) {
> - BasicBlock *BB = BI->getSuccessor(Result ? 0 : 1);
> - if (BB->getSinglePredecessor())
> - Reduction
> - += InlineConstants::InstrCost * Metrics.NumBBInsts.lookup(BB);
> +
> + // Otherwise look for a comparison between constant offset pointers with
> + // a common base.
> + Value *LHSBase, *RHSBase;
> + APInt LHSOffset, RHSOffset;
> + llvm::tie(LHSBase, LHSOffset) = ConstantOffsetPtrs.lookup(LHS);
> + if (LHSBase) {
> + llvm::tie(RHSBase, RHSOffset) = ConstantOffsetPtrs.lookup(RHS);
> + if (RHSBase && LHSBase == RHSBase) {
> + // We have common bases, fold the icmp to a constant based on the
> + // offsets.
> + Constant *CLHS = ConstantInt::get(LHS->getContext(), LHSOffset);
> + Constant *CRHS = ConstantInt::get(RHS->getContext(), RHSOffset);
> + if (Constant *C = ConstantExpr::getICmp(I.getPredicate(), CLHS, CRHS)) {
> + SimplifiedValues[&I] = C;
> + ++NumConstantPtrCmps;
> + return true;
> }
> }
> - } while (!Worklist.empty());
> + }
>
> - return Reduction;
> -}
> + // If the comparison is an equality comparison with null, we can simplify it
> + // for any alloca-derived argument.
> + if (I.isEquality() && isa<ConstantPointerNull>(I.getOperand(1)))
> + if (isAllocaDerivedArg(I.getOperand(0))) {
> + // We can actually predict the result of comparisons between an
> + // alloca-derived value and null. Note that this fires regardless of
> + // SROA firing.
> + bool IsNotEqual = I.getPredicate() == CmpInst::ICMP_NE;
> + SimplifiedValues[&I] = IsNotEqual ? ConstantInt::getTrue(I.getType())
> + : ConstantInt::getFalse(I.getType());
> + return true;
> + }
>
> -/// \brief Compute the reduction possible for a given instruction if we are able
> -/// to SROA an alloca.
> -///
> -/// The reduction for this instruction is added to the SROAReduction output
> -/// parameter. Returns false if this instruction is expected to defeat SROA in
> -/// general.
> -static bool countCodeReductionForSROAInst(Instruction *I,
> - SmallVectorImpl<Value *> &Worklist,
> - unsigned &SROAReduction) {
> - if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
> - if (!LI->isSimple())
> - return false;
> - SROAReduction += InlineConstants::InstrCost;
> - return true;
> + // Finally check for SROA candidates in comparisons.
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
> + if (isa<ConstantPointerNull>(I.getOperand(1))) {
> + accumulateSROACost(CostIt, InlineConstants::InstrCost);
> + return true;
> + }
> +
> + disableSROA(CostIt);
> }
>
> - if (StoreInst *SI = dyn_cast<StoreInst>(I)) {
> - if (!SI->isSimple())
> - return false;
> - SROAReduction += InlineConstants::InstrCost;
> - return true;
> + return false;
> +}
> +
> +bool CallAnalyzer::visitSub(BinaryOperator &I) {
> + // Try to handle a special case: we can fold computing the difference of two
> + // constant-related pointers.
> + Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
> + Value *LHSBase, *RHSBase;
> + APInt LHSOffset, RHSOffset;
> + llvm::tie(LHSBase, LHSOffset) = ConstantOffsetPtrs.lookup(LHS);
> + if (LHSBase) {
> + llvm::tie(RHSBase, RHSOffset) = ConstantOffsetPtrs.lookup(RHS);
> + if (RHSBase && LHSBase == RHSBase) {
> + // We have common bases, fold the subtract to a constant based on the
> + // offsets.
> + Constant *CLHS = ConstantInt::get(LHS->getContext(), LHSOffset);
> + Constant *CRHS = ConstantInt::get(RHS->getContext(), RHSOffset);
> + if (Constant *C = ConstantExpr::getSub(CLHS, CRHS)) {
> + SimplifiedValues[&I] = C;
> + ++NumConstantPtrDiffs;
> + return true;
> + }
> + }
> }
>
> - if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
> - // If the GEP has variable indices, we won't be able to do much with it.
> - if (!GEP->hasAllConstantIndices())
> - return false;
> - // A non-zero GEP will likely become a mask operation after SROA.
> - if (GEP->hasAllZeroIndices())
> - SROAReduction += InlineConstants::InstrCost;
> - Worklist.push_back(GEP);
> + // Otherwise, fall back to the generic logic for simplifying and handling
> + // instructions.
> + return Base::visitSub(I);
> +}
> +
> +bool CallAnalyzer::visitBinaryOperator(BinaryOperator &I) {
> + Value *LHS = I.getOperand(0), *RHS = I.getOperand(1);
> + if (!isa<Constant>(LHS))
> + if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
> + LHS = SimpleLHS;
> + if (!isa<Constant>(RHS))
> + if (Constant *SimpleRHS = SimplifiedValues.lookup(RHS))
> + RHS = SimpleRHS;
> + Value *SimpleV = SimplifyBinOp(I.getOpcode(), LHS, RHS, TD);
> + if (Constant *C = dyn_cast_or_null<Constant>(SimpleV)) {
> + SimplifiedValues[&I] = C;
> return true;
> }
>
> - if (BitCastInst *BCI = dyn_cast<BitCastInst>(I)) {
> - // Track pointer through bitcasts.
> - Worklist.push_back(BCI);
> - SROAReduction += InlineConstants::InstrCost;
> - return true;
> + // Disable any SROA on arguments to arbitrary, unsimplified binary operators.
> + disableSROA(LHS);
> + disableSROA(RHS);
> +
> + return false;
> +}
> +
> +bool CallAnalyzer::visitLoad(LoadInst &I) {
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
> + if (I.isSimple()) {
> + accumulateSROACost(CostIt, InlineConstants::InstrCost);
> + return true;
> + }
> +
> + disableSROA(CostIt);
> }
>
> - // We just look for non-constant operands to ICmp instructions as those will
> - // defeat SROA. The actual reduction for these happens even without SROA.
> - if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))
> - return isa<Constant>(ICI->getOperand(1));
> -
> - if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
> - // SROA can handle a select of alloca iff all uses of the alloca are
> - // loads, and dereferenceable. We assume it's dereferenceable since
> - // we're told the input is an alloca.
> - for (Value::use_iterator UI = SI->use_begin(), UE = SI->use_end();
> - UI != UE; ++UI) {
> - LoadInst *LI = dyn_cast<LoadInst>(*UI);
> - if (LI == 0 || !LI->isSimple())
> - return false;
> + return false;
> +}
> +
> +bool CallAnalyzer::visitStore(StoreInst &I) {
> + Value *SROAArg;
> + DenseMap<Value *, int>::iterator CostIt;
> + if (lookupSROAArgAndCost(I.getOperand(0), SROAArg, CostIt)) {
> + if (I.isSimple()) {
> + accumulateSROACost(CostIt, InlineConstants::InstrCost);
> + return true;
> }
> - // We don't know whether we'll be deleting the rest of the chain of
> - // instructions from the SelectInst on, because we don't know whether
> - // the other side of the select is also an alloca or not.
> - return true;
> +
> + disableSROA(CostIt);
> + }
> +
> + return false;
> +}
> +
> +bool CallAnalyzer::visitCallSite(CallSite CS) {
> + if (CS.isCall() && cast<CallInst>(CS.getInstruction())->canReturnTwice() &&
> + !F.hasFnAttr(Attribute::ReturnsTwice)) {
> + // This aborts the entire analysis.
> + ExposesReturnsTwice = true;
> + return false;
> }
>
> - if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
> + if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CS.getInstruction())) {
> switch (II->getIntrinsicID()) {
> default:
> - return false;
> + return Base::visitCallSite(CS);
> +
> + case Intrinsic::dbg_declare:
> + case Intrinsic::dbg_value:
> + case Intrinsic::invariant_start:
> + case Intrinsic::invariant_end:
> + case Intrinsic::lifetime_start:
> + case Intrinsic::lifetime_end:
> case Intrinsic::memset:
> case Intrinsic::memcpy:
> case Intrinsic::memmove:
> - case Intrinsic::lifetime_start:
> - case Intrinsic::lifetime_end:
> - // SROA can usually chew through these intrinsics.
> - SROAReduction += InlineConstants::InstrCost;
> + case Intrinsic::objectsize:
> + case Intrinsic::ptr_annotation:
> + case Intrinsic::var_annotation:
> + // SROA can usually chew through these intrinsics and they have no cost
> + // so don't pay the price of analyzing them in detail.
> return true;
> }
> }
>
> - // If there is some other strange instruction, we're not going to be
> - // able to do much if we inline this.
> + if (Function *F = CS.getCalledFunction()) {
> + if (F == CS.getInstruction()->getParent()->getParent()) {
> + // This flag will fully abort the analysis, so don't bother with anything
> + // else.
> + IsRecursive = true;
> + return false;
> + }
> +
> + if (!callIsSmall(F)) {
> + // We account for the average 1 instruction per call argument setup
> + // here.
> + Cost += CS.arg_size() * InlineConstants::InstrCost;
> +
> + // Everything other than inline ASM will also have a significant cost
> + // merely from making the call.
> + if (!isa<InlineAsm>(CS.getCalledValue()))
> + Cost += InlineConstants::CallPenalty;
> + }
> +
> + return Base::visitCallSite(CS);
> + }
> +
> + // Otherwise we're in a very special case -- an indirect function call. See
> + // if we can be particularly clever about this.
> + Value *Callee = CS.getCalledValue();
> +
> + // First, pay the price of the argument setup. We account for the average
> + // 1 instruction per call argument setup here.
> + Cost += CS.arg_size() * InlineConstants::InstrCost;
> +
> + // Next, check if this happens to be an indirect function call to a known
> + // function in this inline context. If not, we've done all we can.
> + Function *F = dyn_cast_or_null<Function>(SimplifiedValues.lookup(Callee));
> + if (!F)
> + return Base::visitCallSite(CS);
> +
> + // If we have a constant that we are calling as a function, we can peer
> + // through it and see the function target. This happens not infrequently
> + // during devirtualization and so we want to give it a hefty bonus for
> + // inlining, but cap that bonus in the event that inlining wouldn't pan
> + // out. Pretend to inline the function, with a custom threshold.
> + CallAnalyzer CA(TD, *F, InlineConstants::IndirectCallThreshold);
> + if (CA.analyzeCall(CS)) {
> + // We were able to inline the indirect call! Subtract the cost from the
> + // bonus we want to apply, but don't go below zero.
> + Cost -= std::max(0, InlineConstants::IndirectCallThreshold - CA.getCost());
> + }
> +
> + return Base::visitCallSite(CS);
> +}
> +
> +bool CallAnalyzer::visitInstruction(Instruction &I) {
> + // We found something we don't understand or can't handle. Mark any SROA-able
> + // values in the operand list as no longer viable.
> + for (User::op_iterator OI = I.op_begin(), OE = I.op_end(); OI != OE; ++OI)
> + disableSROA(*OI);
> +
> return false;
> }
>
> -unsigned InlineCostAnalyzer::FunctionInfo::countCodeReductionForAlloca(
> - const CodeMetrics &Metrics, Value *V) {
> - if (!V->getType()->isPointerTy()) return 0; // Not a pointer
> - unsigned Reduction = 0;
> - unsigned SROAReduction = 0;
> - bool CanSROAAlloca = true;
>
> - SmallVector<Value *, 4> Worklist;
> - Worklist.push_back(V);
> - do {
> - Value *V = Worklist.pop_back_val();
> - for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
> - UI != E; ++UI){
> - Instruction *I = cast<Instruction>(*UI);
> +/// \brief Analyze a basic block for its contribution to the inline cost.
> +///
> +/// This method walks the analyzer over every instruction in the given basic
> +/// block and accounts for their cost during inlining at this callsite. It
> +/// aborts early if the threshold has been exceeded or an impossible to inline
> +/// construct has been detected. It returns false if inlining is no longer
> +/// viable, and true if inlining remains viable.
> +bool CallAnalyzer::analyzeBlock(BasicBlock *BB) {
> + for (BasicBlock::iterator I = BB->begin(), E = llvm::prior(BB->end());
> + I != E; ++I) {
> + ++NumInstructions;
> + if (isa<ExtractElementInst>(I) || I->getType()->isVectorTy())
> + ++NumVectorInstructions;
> +
> + // If the instruction simplified to a constant, there is no cost to this
> + // instruction. Visit the instructions using our InstVisitor to account for
> + // all of the per-instruction logic. The visit tree returns true if we
> + // consumed the instruction in any way, and false if the instruction's base
> + // cost should count against inlining.
> + if (Base::visit(I))
> + ++NumInstructionsSimplified;
> + else
> + Cost += InlineConstants::InstrCost;
>
> - if (ICmpInst *ICI = dyn_cast<ICmpInst>(I))
> - Reduction += countCodeReductionForAllocaICmp(Metrics, ICI);
> + // If the visit this instruction detected an uninlinable pattern, abort.
> + if (IsRecursive || ExposesReturnsTwice || HasDynamicAlloca)
> + return false;
>
> - if (CanSROAAlloca)
> - CanSROAAlloca = countCodeReductionForSROAInst(I, Worklist,
> - SROAReduction);
> - }
> - } while (!Worklist.empty());
> + if (NumVectorInstructions > NumInstructions/2)
> + VectorBonus = FiftyPercentVectorBonus;
> + else if (NumVectorInstructions > NumInstructions/10)
> + VectorBonus = TenPercentVectorBonus;
> + else
> + VectorBonus = 0;
> +
> + // Check if we've past the threshold so we don't spin in huge basic
> + // blocks that will never inline.
> + if (!AlwaysInline && Cost > (Threshold + VectorBonus))
> + return false;
> + }
>
> - return Reduction + (CanSROAAlloca ? SROAReduction : 0);
> + return true;
> }
>
> -void InlineCostAnalyzer::FunctionInfo::countCodeReductionForPointerPair(
> - const CodeMetrics &Metrics, DenseMap<Value *, unsigned> &PointerArgs,
> - Value *V, unsigned ArgIdx) {
> - SmallVector<Value *, 4> Worklist;
> - Worklist.push_back(V);
> +/// \brief Compute the base pointer and cumulative constant offsets for V.
> +///
> +/// This strips all constant offsets off of V, leaving it the base pointer, and
> +/// accumulates the total constant offset applied in the returned constant. It
> +/// returns 0 if V is not a pointer, and returns the constant '0' if there are
> +/// no constant offsets applied.
> +ConstantInt *CallAnalyzer::stripAndComputeInBoundsConstantOffsets(Value *&V) {
> + if (!TD || !V->getType()->isPointerTy())
> + return 0;
> +
> + unsigned IntPtrWidth = TD->getPointerSizeInBits();
> + APInt Offset = APInt::getNullValue(IntPtrWidth);
> +
> + // Even though we don't look through PHI nodes, we could be called on an
> + // instruction in an unreachable block, which may be on a cycle.
> + SmallPtrSet<Value *, 4> Visited;
> + Visited.insert(V);
> do {
> - Value *V = Worklist.pop_back_val();
> - for (Value::use_iterator UI = V->use_begin(), E = V->use_end();
> - UI != E; ++UI){
> - Instruction *I = cast<Instruction>(*UI);
> -
> - if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
> - // If the GEP has variable indices, we won't be able to do much with it.
> - if (!GEP->hasAllConstantIndices())
> - continue;
> - // Unless the GEP is in-bounds, some comparisons will be non-constant.
> - // Fortunately, the real-world cases where this occurs uses in-bounds
> - // GEPs, and so we restrict the optimization to them here.
> - if (!GEP->isInBounds())
> - continue;
> + if (GEPOperator *GEP = dyn_cast<GEPOperator>(V)) {
> + if (!GEP->isInBounds() || !accumulateGEPOffset(*GEP, Offset))
> + return 0;
> + V = GEP->getPointerOperand();
> + } else if (Operator::getOpcode(V) == Instruction::BitCast) {
> + V = cast<Operator>(V)->getOperand(0);
> + } else if (GlobalAlias *GA = dyn_cast<GlobalAlias>(V)) {
> + if (GA->mayBeOverridden())
> + break;
> + V = GA->getAliasee();
> + } else {
> + break;
> + }
> + assert(V->getType()->isPointerTy() && "Unexpected operand type!");
> + } while (Visited.insert(V));
>
> - // Constant indices just change the constant offset. Add the resulting
> - // value both to our worklist for this argument, and to the set of
> - // viable paired values with future arguments.
> - PointerArgs[GEP] = ArgIdx;
> - Worklist.push_back(GEP);
> - continue;
> - }
> + Type *IntPtrTy = TD->getIntPtrType(V->getContext());
> + return cast<ConstantInt>(ConstantInt::get(IntPtrTy, Offset));
> +}
>
> - // Track pointer through casts. Even when the result is not a pointer, it
> - // remains a constant relative to constants derived from other constant
> - // pointers.
> - if (CastInst *CI = dyn_cast<CastInst>(I)) {
> - PointerArgs[CI] = ArgIdx;
> - Worklist.push_back(CI);
> - continue;
> - }
> +/// \brief Analyze a call site for potential inlining.
> +///
> +/// Returns true if inlining this call is viable, and false if it is not
> +/// viable. It computes the cost and adjusts the threshold based on numerous
> +/// factors and heuristics. If this method returns false but the computed cost
> +/// is below the computed threshold, then inlining was forcibly disabled by
> +/// some artifact of the rountine.
> +bool CallAnalyzer::analyzeCall(CallSite CS) {
> + // Track whether the post-inlining function would have more than one basic
> + // block. A single basic block is often intended for inlining. Balloon the
> + // threshold by 50% until we pass the single-BB phase.
> + bool SingleBB = true;
> + int SingleBBBonus = Threshold / 2;
> + Threshold += SingleBBBonus;
> +
> + // Unless we are always-inlining, perform some tweaks to the cost and
> + // threshold based on the direct callsite information.
> + if (!AlwaysInline) {
> + // We want to more aggressively inline vector-dense kernels, so up the
> + // threshold, and we'll lower it if the % of vector instructions gets too
> + // low.
> + assert(NumInstructions == 0);
> + assert(NumVectorInstructions == 0);
> + FiftyPercentVectorBonus = Threshold;
> + TenPercentVectorBonus = Threshold / 2;
> +
> + // Subtract off one instruction per call argument as those will be free after
> + // inlining.
> + Cost -= CS.arg_size() * InlineConstants::InstrCost;
> +
> + // If there is only one call of the function, and it has internal linkage,
> + // the cost of inlining it drops dramatically.
> + if (F.hasLocalLinkage() && F.hasOneUse() && &F == CS.getCalledFunction())
> + Cost += InlineConstants::LastCallToStaticBonus;
> +
> + // If the instruction after the call, or if the normal destination of the
> + // invoke is an unreachable instruction, the function is noreturn. As such,
> + // there is little point in inlining this unless there is literally zero cost.
> + if (InvokeInst *II = dyn_cast<InvokeInst>(CS.getInstruction())) {
> + if (isa<UnreachableInst>(II->getNormalDest()->begin()))
> + Threshold = 1;
> + } else if (isa<UnreachableInst>(++BasicBlock::iterator(CS.getInstruction())))
> + Threshold = 1;
> +
> + // If this function uses the coldcc calling convention, prefer not to inline
> + // it.
> + if (F.getCallingConv() == CallingConv::Cold)
> + Cost += InlineConstants::ColdccPenalty;
>
> - // There are two instructions which produce a strict constant value when
> - // applied to two related pointer values. Ignore everything else.
> - if (!isa<ICmpInst>(I) && I->getOpcode() != Instruction::Sub)
> - continue;
> - assert(I->getNumOperands() == 2);
> + // Check if we're done. This can happen due to bonuses and penalties.
> + if (Cost > Threshold)
> + return false;
> + }
>
> - // Ensure that the two operands are in our set of potentially paired
> - // pointers (or are derived from them).
> - Value *OtherArg = I->getOperand(0);
> - if (OtherArg == V)
> - OtherArg = I->getOperand(1);
> - DenseMap<Value *, unsigned>::const_iterator ArgIt
> - = PointerArgs.find(OtherArg);
> - if (ArgIt == PointerArgs.end())
> - continue;
> - std::pair<unsigned, unsigned> ArgPair(ArgIt->second, ArgIdx);
> - if (ArgPair.first > ArgPair.second)
> - std::swap(ArgPair.first, ArgPair.second);
> -
> - PointerArgPairWeights[ArgPair]
> - += countCodeReductionForConstant(Metrics, I);
> - }
> - } while (!Worklist.empty());
> -}
> -
> -/// analyzeFunction - Fill in the current structure with information gleaned
> -/// from the specified function.
> -void InlineCostAnalyzer::FunctionInfo::analyzeFunction(Function *F,
> - const TargetData *TD) {
> - Metrics.analyzeFunction(F, TD);
> -
> - // A function with exactly one return has it removed during the inlining
> - // process (see InlineFunction), so don't count it.
> - // FIXME: This knowledge should really be encoded outside of FunctionInfo.
> - if (Metrics.NumRets==1)
> - --Metrics.NumInsts;
> -
> - ArgumentWeights.reserve(F->arg_size());
> - DenseMap<Value *, unsigned> PointerArgs;
> - unsigned ArgIdx = 0;
> - for (Function::arg_iterator I = F->arg_begin(), E = F->arg_end(); I != E;
> - ++I, ++ArgIdx) {
> - // Count how much code can be eliminated if one of the arguments is
> - // a constant or an alloca.
> - ArgumentWeights.push_back(ArgInfo(countCodeReductionForConstant(Metrics, I),
> - countCodeReductionForAlloca(Metrics, I)));
> -
> - // If the argument is a pointer, also check for pairs of pointers where
> - // knowing a fixed offset between them allows simplification. This pattern
> - // arises mostly due to STL algorithm patterns where pointers are used as
> - // random access iterators.
> - if (!I->getType()->isPointerTy())
> - continue;
> - PointerArgs[I] = ArgIdx;
> - countCodeReductionForPointerPair(Metrics, PointerArgs, I, ArgIdx);
> + if (F.empty())
> + return true;
> +
> + // Track whether we've seen a return instruction. The first return
> + // instruction is free, as at least one will usually disappear in inlining.
> + bool HasReturn = false;
> +
> + // Populate our simplified values by mapping from function arguments to call
> + // arguments with known important simplifications.
> + CallSite::arg_iterator CAI = CS.arg_begin();
> + for (Function::arg_iterator FAI = F.arg_begin(), FAE = F.arg_end();
> + FAI != FAE; ++FAI, ++CAI) {
> + assert(CAI != CS.arg_end());
> + if (Constant *C = dyn_cast<Constant>(CAI))
> + SimplifiedValues[FAI] = C;
> +
> + Value *PtrArg = *CAI;
> + if (ConstantInt *C = stripAndComputeInBoundsConstantOffsets(PtrArg)) {
> + ConstantOffsetPtrs[FAI] = std::make_pair(PtrArg, C->getValue());
> +
> + // We can SROA any pointer arguments derived from alloca instructions.
> + if (isa<AllocaInst>(PtrArg)) {
> + SROAArgValues[FAI] = PtrArg;
> + SROAArgCosts[PtrArg] = 0;
> + }
> + }
> }
> -}
> + NumConstantArgs = SimplifiedValues.size();
> + NumConstantOffsetPtrArgs = ConstantOffsetPtrs.size();
> + NumAllocaArgs = SROAArgValues.size();
> +
> + // The worklist of live basic blocks in the callee *after* inlining. We avoid
> + // adding basic blocks of the callee which can be proven to be dead for this
> + // particular call site in order to get more accurate cost estimates. This
> + // requires a somewhat heavyweight iteration pattern: we need to walk the
> + // basic blocks in a breadth-first order as we insert live successors. To
> + // accomplish this, prioritizing for small iterations because we exit after
> + // crossing our threshold, we use a small-size optimized SetVector.
> + typedef SetVector<BasicBlock *, SmallVector<BasicBlock *, 16>,
> + SmallPtrSet<BasicBlock *, 16> > BBSetVector;
> + BBSetVector BBWorklist;
> + BBWorklist.insert(&F.getEntryBlock());
> + // Note that we *must not* cache the size, this loop grows the worklist.
> + for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {
> + // Bail out the moment we cross the threshold. This means we'll under-count
> + // the cost, but only when undercounting doesn't matter.
> + if (!AlwaysInline && Cost > (Threshold + VectorBonus))
> + break;
>
> -/// NeverInline - returns true if the function should never be inlined into
> -/// any caller
> -bool InlineCostAnalyzer::FunctionInfo::NeverInline() {
> - return (Metrics.exposesReturnsTwice || Metrics.isRecursive ||
> - Metrics.containsIndirectBr);
> -}
> -
> -// ConstantFunctionBonus - Figure out how much of a bonus we can get for
> -// possibly devirtualizing a function. We'll subtract the size of the function
> -// we may wish to inline from the indirect call bonus providing a limit on
> -// growth. Leave an upper limit of 0 for the bonus - we don't want to penalize
> -// inlining because we decide we don't want to give a bonus for
> -// devirtualizing.
> -int InlineCostAnalyzer::ConstantFunctionBonus(CallSite CS, Constant *C) {
> -
> - // This could just be NULL.
> - if (!C) return 0;
> -
> - Function *F = dyn_cast<Function>(C);
> - if (!F) return 0;
> -
> - int Bonus = InlineConstants::IndirectCallBonus + getInlineSize(CS, F);
> - return (Bonus > 0) ? 0 : Bonus;
> -}
> -
> -// CountBonusForConstant - Figure out an approximation for how much per-call
> -// performance boost we can expect if the specified value is constant.
> -int InlineCostAnalyzer::CountBonusForConstant(Value *V, Constant *C) {
> - unsigned Bonus = 0;
> - for (Value::use_iterator UI = V->use_begin(), E = V->use_end(); UI != E;++UI){
> - User *U = *UI;
> - if (CallInst *CI = dyn_cast<CallInst>(U)) {
> - // Turning an indirect call into a direct call is a BIG win
> - if (CI->getCalledValue() == V)
> - Bonus += ConstantFunctionBonus(CallSite(CI), C);
> - } else if (InvokeInst *II = dyn_cast<InvokeInst>(U)) {
> - // Turning an indirect call into a direct call is a BIG win
> - if (II->getCalledValue() == V)
> - Bonus += ConstantFunctionBonus(CallSite(II), C);
> - }
> - // FIXME: Eliminating conditional branches and switches should
> - // also yield a per-call performance boost.
> - else {
> - // Figure out the bonuses that wll accrue due to simple constant
> - // propagation.
> - Instruction &Inst = cast<Instruction>(*U);
> -
> - // We can't constant propagate instructions which have effects or
> - // read memory.
> - //
> - // FIXME: It would be nice to capture the fact that a load from a
> - // pointer-to-constant-global is actually a *really* good thing to zap.
> - // Unfortunately, we don't know the pointer that may get propagated here,
> - // so we can't make this decision.
> - if (Inst.mayReadFromMemory() || Inst.mayHaveSideEffects() ||
> - isa<AllocaInst>(Inst))
> - continue;
> + BasicBlock *BB = BBWorklist[Idx];
> + if (BB->empty())
> + continue;
>
> - bool AllOperandsConstant = true;
> - for (unsigned i = 0, e = Inst.getNumOperands(); i != e; ++i)
> - if (!isa<Constant>(Inst.getOperand(i)) && Inst.getOperand(i) != V) {
> - AllOperandsConstant = false;
> - break;
> + // Handle the terminator cost here where we can track returns and other
> + // function-wide constructs.
> + TerminatorInst *TI = BB->getTerminator();
> +
> + // We never want to inline functions that contain an indirectbr. This is
> + // incorrect because all the blockaddress's (in static global initializers
> + // for example) would be referring to the original function, and this indirect
> + // jump would jump from the inlined copy of the function into the original
> + // function which is extremely undefined behavior.
> + // FIXME: This logic isn't really right; we can safely inline functions
> + // with indirectbr's as long as no other function or global references the
> + // blockaddress of a block within the current function. And as a QOI issue,
> + // if someone is using a blockaddress without an indirectbr, and that
> + // reference somehow ends up in another function or global, we probably
> + // don't want to inline this function.
> + if (isa<IndirectBrInst>(TI))
> + return false;
> +
> + if (!HasReturn && isa<ReturnInst>(TI))
> + HasReturn = true;
> + else
> + Cost += InlineConstants::InstrCost;
> +
> + // Analyze the cost of this block. If we blow through the threshold, this
> + // returns false, and we can bail on out.
> + if (!analyzeBlock(BB)) {
> + if (IsRecursive || ExposesReturnsTwice || HasDynamicAlloca)
> + return false;
> + break;
> + }
> +
> + // Add in the live successors by first checking whether we have terminator
> + // that may be simplified based on the values simplified by this call.
> + if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
> + if (BI->isConditional()) {
> + Value *Cond = BI->getCondition();
> + if (ConstantInt *SimpleCond
> + = dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
> + BBWorklist.insert(BI->getSuccessor(SimpleCond->isZero() ? 1 : 0));
> + continue;
> }
> + }
> + } else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
> + Value *Cond = SI->getCondition();
> + if (ConstantInt *SimpleCond
> + = dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
> + BBWorklist.insert(SI->findCaseValue(SimpleCond).getCaseSuccessor());
> + continue;
> + }
> + }
>
> - if (AllOperandsConstant)
> - Bonus += CountBonusForConstant(&Inst);
> + // If we're unable to select a particular successor, just count all of
> + // them.
> + for (unsigned TIdx = 0, TSize = TI->getNumSuccessors(); TIdx != TSize; ++TIdx)
> + BBWorklist.insert(TI->getSuccessor(TIdx));
> +
> + // If we had any successors at this point, than post-inlining is likely to
> + // have them as well. Note that we assume any basic blocks which existed
> + // due to branches or switches which folded above will also fold after
> + // inlining.
> + if (SingleBB && TI->getNumSuccessors() > 1) {
> + // Take off the bonus we applied to the threshold.
> + Threshold -= SingleBBBonus;
> + SingleBB = false;
> }
> }
>
> - return Bonus;
> -}
> + Threshold += VectorBonus;
>
> -int InlineCostAnalyzer::getInlineSize(CallSite CS, Function *Callee) {
> - // Get information about the callee.
> - FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
> -
> - // If we haven't calculated this information yet, do so now.
> - if (CalleeFI->Metrics.NumBlocks == 0)
> - CalleeFI->analyzeFunction(Callee, TD);
> -
> - // InlineCost - This value measures how good of an inline candidate this call
> - // site is to inline. A lower inline cost make is more likely for the call to
> - // be inlined. This value may go negative.
> - //
> - int InlineCost = 0;
> -
> - // Compute any size reductions we can expect due to arguments being passed into
> - // the function.
> - //
> - unsigned ArgNo = 0;
> - CallSite::arg_iterator I = CS.arg_begin();
> - for (Function::arg_iterator FI = Callee->arg_begin(), FE = Callee->arg_end();
> - FI != FE; ++I, ++FI, ++ArgNo) {
> -
> - // If an alloca is passed in, inlining this function is likely to allow
> - // significant future optimization possibilities (like scalar promotion, and
> - // scalarization), so encourage the inlining of the function.
> - //
> - if (isa<AllocaInst>(I))
> - InlineCost -= CalleeFI->ArgumentWeights[ArgNo].AllocaWeight;
> -
> - // If this is a constant being passed into the function, use the argument
> - // weights calculated for the callee to determine how much will be folded
> - // away with this information.
> - else if (isa<Constant>(I))
> - InlineCost -= CalleeFI->ArgumentWeights[ArgNo].ConstantWeight;
> - }
> -
> - const DenseMap<std::pair<unsigned, unsigned>, unsigned> &ArgPairWeights
> - = CalleeFI->PointerArgPairWeights;
> - for (DenseMap<std::pair<unsigned, unsigned>, unsigned>::const_iterator I
> - = ArgPairWeights.begin(), E = ArgPairWeights.end();
> - I != E; ++I)
> - if (CS.getArgument(I->first.first)->stripInBoundsConstantOffsets() ==
> - CS.getArgument(I->first.second)->stripInBoundsConstantOffsets())
> - InlineCost -= I->second;
> -
> - // Each argument passed in has a cost at both the caller and the callee
> - // sides. Measurements show that each argument costs about the same as an
> - // instruction.
> - InlineCost -= (CS.arg_size() * InlineConstants::InstrCost);
> -
> - // Now that we have considered all of the factors that make the call site more
> - // likely to be inlined, look at factors that make us not want to inline it.
> -
> - // Calls usually take a long time, so they make the inlining gain smaller.
> - InlineCost += CalleeFI->Metrics.NumCalls * InlineConstants::CallPenalty;
> -
> - // Look at the size of the callee. Each instruction counts as 5.
> - InlineCost += CalleeFI->Metrics.NumInsts * InlineConstants::InstrCost;
> -
> - return InlineCost;
> -}
> -
> -int InlineCostAnalyzer::getInlineBonuses(CallSite CS, Function *Callee) {
> - // Get information about the callee.
> - FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
> -
> - // If we haven't calculated this information yet, do so now.
> - if (CalleeFI->Metrics.NumBlocks == 0)
> - CalleeFI->analyzeFunction(Callee, TD);
> -
> - bool isDirectCall = CS.getCalledFunction() == Callee;
> - Instruction *TheCall = CS.getInstruction();
> - int Bonus = 0;
> -
> - // If there is only one call of the function, and it has internal linkage,
> - // make it almost guaranteed to be inlined.
> - //
> - if (Callee->hasLocalLinkage() && Callee->hasOneUse() && isDirectCall)
> - Bonus += InlineConstants::LastCallToStaticBonus;
> -
> - // If the instruction after the call, or if the normal destination of the
> - // invoke is an unreachable instruction, the function is noreturn. As such,
> - // there is little point in inlining this.
> - if (InvokeInst *II = dyn_cast<InvokeInst>(TheCall)) {
> - if (isa<UnreachableInst>(II->getNormalDest()->begin()))
> - Bonus += InlineConstants::NoreturnPenalty;
> - } else if (isa<UnreachableInst>(++BasicBlock::iterator(TheCall)))
> - Bonus += InlineConstants::NoreturnPenalty;
> -
> - // If this function uses the coldcc calling convention, prefer not to inline
> - // it.
> - if (Callee->getCallingConv() == CallingConv::Cold)
> - Bonus += InlineConstants::ColdccPenalty;
> -
> - // Add to the inline quality for properties that make the call valuable to
> - // inline. This includes factors that indicate that the result of inlining
> - // the function will be optimizable. Currently this just looks at arguments
> - // passed into the function.
> - //
> - CallSite::arg_iterator I = CS.arg_begin();
> - for (Function::arg_iterator FI = Callee->arg_begin(), FE = Callee->arg_end();
> - FI != FE; ++I, ++FI)
> - // Compute any constant bonus due to inlining we want to give here.
> - if (isa<Constant>(I))
> - Bonus += CountBonusForConstant(FI, cast<Constant>(I));
> -
> - return Bonus;
> + return AlwaysInline || Cost < Threshold;
> }
>
> -// getInlineCost - The heuristic used to determine if we should inline the
> -// function call or not.
> -//
> -InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS) {
> - return getInlineCost(CS, CS.getCalledFunction());
> +/// \brief Dump stats about this call's analysis.
> +void CallAnalyzer::dump() {
> +#define DEBUG_PRINT_STAT(x) llvm::dbgs() << " " #x ": " << x << "\n"
> + DEBUG_PRINT_STAT(NumConstantArgs);
> + DEBUG_PRINT_STAT(NumConstantOffsetPtrArgs);
> + DEBUG_PRINT_STAT(NumAllocaArgs);
> + DEBUG_PRINT_STAT(NumConstantPtrCmps);
> + DEBUG_PRINT_STAT(NumConstantPtrDiffs);
> + DEBUG_PRINT_STAT(NumInstructionsSimplified);
> + DEBUG_PRINT_STAT(SROACostSavings);
> + DEBUG_PRINT_STAT(SROACostSavingsLost);
> +#undef DEBUG_PRINT_STAT
> }
>
> -InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS, Function *Callee) {
> - Instruction *TheCall = CS.getInstruction();
> - Function *Caller = TheCall->getParent()->getParent();
> +InlineCost InlineCostAnalyzer::getInlineCost(CallSite CS, int Threshold) {
> + Function *Callee = CS.getCalledFunction();
>
> // Don't inline functions which can be redefined at link-time to mean
> // something else. Don't inline functions marked noinline or call sites
> // marked noinline.
> - if (Callee->mayBeOverridden() || Callee->hasFnAttr(Attribute::NoInline) ||
> - CS.isNoInline())
> + if (!Callee || Callee->mayBeOverridden() ||
> + Callee->hasFnAttr(Attribute::NoInline) || CS.isNoInline())
> return llvm::InlineCost::getNever();
>
> - // Get information about the callee.
> - FunctionInfo *CalleeFI = &CachedFunctionInfo[Callee];
> + DEBUG(llvm::dbgs() << " Analyzing call of " << Callee->getName() << "...\n");
>
> - // If we haven't calculated this information yet, do so now.
> - if (CalleeFI->Metrics.NumBlocks == 0)
> - CalleeFI->analyzeFunction(Callee, TD);
> + CallAnalyzer CA(TD, *Callee, Threshold);
> + bool ShouldInline = CA.analyzeCall(CS);
>
> - // If we should never inline this, return a huge cost.
> - if (CalleeFI->NeverInline())
> - return InlineCost::getNever();
> + DEBUG(CA.dump());
>
> - // FIXME: It would be nice to kill off CalleeFI->NeverInline. Then we
> - // could move this up and avoid computing the FunctionInfo for
> - // things we are going to just return always inline for. This
> - // requires handling setjmp somewhere else, however.
> - if (!Callee->isDeclaration() && Callee->hasFnAttr(Attribute::AlwaysInline))
> + // Check if there was a reason to force inlining or no inlining.
> + if (!ShouldInline && CA.getCost() < CA.getThreshold())
> + return InlineCost::getNever();
> + if (ShouldInline && CA.getCost() >= CA.getThreshold())
> return InlineCost::getAlways();
>
> - if (CalleeFI->Metrics.usesDynamicAlloca) {
> - // Get information about the caller.
> - FunctionInfo &CallerFI = CachedFunctionInfo[Caller];
> -
> - // If we haven't calculated this information yet, do so now.
> - if (CallerFI.Metrics.NumBlocks == 0) {
> - CallerFI.analyzeFunction(Caller, TD);
> -
> - // Recompute the CalleeFI pointer, getting Caller could have invalidated
> - // it.
> - CalleeFI = &CachedFunctionInfo[Callee];
> - }
> -
> - // Don't inline a callee with dynamic alloca into a caller without them.
> - // Functions containing dynamic alloca's are inefficient in various ways;
> - // don't create more inefficiency.
> - if (!CallerFI.Metrics.usesDynamicAlloca)
> - return InlineCost::getNever();
> - }
> -
> - // InlineCost - This value measures how good of an inline candidate this call
> - // site is to inline. A lower inline cost make is more likely for the call to
> - // be inlined. This value may go negative due to the fact that bonuses
> - // are negative numbers.
> - //
> - int InlineCost = getInlineSize(CS, Callee) + getInlineBonuses(CS, Callee);
> - return llvm::InlineCost::get(InlineCost);
> -}
> -
> -// getInlineFudgeFactor - Return a > 1.0 factor if the inliner should use a
> -// higher threshold to determine if the function call should be inlined.
> -float InlineCostAnalyzer::getInlineFudgeFactor(CallSite CS) {
> - Function *Callee = CS.getCalledFunction();
> -
> - // Get information about the callee.
> - FunctionInfo &CalleeFI = CachedFunctionInfo[Callee];
> -
> - // If we haven't calculated this information yet, do so now.
> - if (CalleeFI.Metrics.NumBlocks == 0)
> - CalleeFI.analyzeFunction(Callee, TD);
> -
> - float Factor = 1.0f;
> - // Single BB functions are often written to be inlined.
> - if (CalleeFI.Metrics.NumBlocks == 1)
> - Factor += 0.5f;
> -
> - // Be more aggressive if the function contains a good chunk (if it mades up
> - // at least 10% of the instructions) of vector instructions.
> - if (CalleeFI.Metrics.NumVectorInsts > CalleeFI.Metrics.NumInsts/2)
> - Factor += 2.0f;
> - else if (CalleeFI.Metrics.NumVectorInsts > CalleeFI.Metrics.NumInsts/10)
> - Factor += 1.5f;
> - return Factor;
> + return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());
> }
>
> /// growCachedCostInfo - update the cached cost info for Caller after Callee has
> /// been inlined.
> void
> InlineCostAnalyzer::growCachedCostInfo(Function *Caller, Function *Callee) {
> - CodeMetrics &CallerMetrics = CachedFunctionInfo[Caller].Metrics;
> -
> - // For small functions we prefer to recalculate the cost for better accuracy.
> - if (CallerMetrics.NumBlocks < 10 && CallerMetrics.NumInsts < 1000) {
> - resetCachedCostInfo(Caller);
> - return;
> - }
> -
> - // For large functions, we can save a lot of computation time by skipping
> - // recalculations.
> - if (CallerMetrics.NumCalls > 0)
> - --CallerMetrics.NumCalls;
> -
> - if (Callee == 0) return;
> -
> - CodeMetrics &CalleeMetrics = CachedFunctionInfo[Callee].Metrics;
> -
> - // If we don't have metrics for the callee, don't recalculate them just to
> - // update an approximation in the caller. Instead, just recalculate the
> - // caller info from scratch.
> - if (CalleeMetrics.NumBlocks == 0) {
> - resetCachedCostInfo(Caller);
> - return;
> - }
> -
> - // Since CalleeMetrics were already calculated, we know that the CallerMetrics
> - // reference isn't invalidated: both were in the DenseMap.
> - CallerMetrics.usesDynamicAlloca |= CalleeMetrics.usesDynamicAlloca;
> -
> - // FIXME: If any of these three are true for the callee, the callee was
> - // not inlined into the caller, so I think they're redundant here.
> - CallerMetrics.exposesReturnsTwice |= CalleeMetrics.exposesReturnsTwice;
> - CallerMetrics.isRecursive |= CalleeMetrics.isRecursive;
> - CallerMetrics.containsIndirectBr |= CalleeMetrics.containsIndirectBr;
> -
> - CallerMetrics.NumInsts += CalleeMetrics.NumInsts;
> - CallerMetrics.NumBlocks += CalleeMetrics.NumBlocks;
> - CallerMetrics.NumCalls += CalleeMetrics.NumCalls;
> - CallerMetrics.NumVectorInsts += CalleeMetrics.NumVectorInsts;
> - CallerMetrics.NumRets += CalleeMetrics.NumRets;
> -
> - // analyzeBasicBlock counts each function argument as an inst.
> - if (CallerMetrics.NumInsts >= Callee->arg_size())
> - CallerMetrics.NumInsts -= Callee->arg_size();
> - else
> - CallerMetrics.NumInsts = 0;
> -
> - // We are not updating the argument weights. We have already determined that
> - // Caller is a fairly large function, so we accept the loss of precision.
> }
>
> /// clear - empty the cache of inline costs
> void InlineCostAnalyzer::clear() {
> - CachedFunctionInfo.clear();
> }
>
> Modified: llvm/trunk/lib/Transforms/IPO/InlineAlways.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/InlineAlways.cpp?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/IPO/InlineAlways.cpp (original)
> +++ llvm/trunk/lib/Transforms/IPO/InlineAlways.cpp Sat Mar 31 07:42:41 2012
> @@ -59,10 +59,7 @@
> // We still have to check the inline cost in case there are reasons to
> // not inline which trump the always-inline attribute such as setjmp and
> // indirectbr.
> - return CA.getInlineCost(CS);
> - }
> - float getInlineFudgeFactor(CallSite CS) {
> - return CA.getInlineFudgeFactor(CS);
> + return CA.getInlineCost(CS, getInlineThreshold(CS));
> }
> void resetCachedCostInfo(Function *Caller) {
> CA.resetCachedCostInfo(Caller);
>
> Modified: llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp (original)
> +++ llvm/trunk/lib/Transforms/IPO/InlineSimple.cpp Sat Mar 31 07:42:41 2012
> @@ -40,10 +40,7 @@
> }
> static char ID; // Pass identification, replacement for typeid
> InlineCost getInlineCost(CallSite CS) {
> - return CA.getInlineCost(CS);
> - }
> - float getInlineFudgeFactor(CallSite CS) {
> - return CA.getInlineFudgeFactor(CS);
> + return CA.getInlineCost(CS, getInlineThreshold(CS));
> }
> void resetCachedCostInfo(Function *Caller) {
> CA.resetCachedCostInfo(Caller);
>
> Modified: llvm/trunk/lib/Transforms/IPO/Inliner.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/Inliner.cpp?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/IPO/Inliner.cpp (original)
> +++ llvm/trunk/lib/Transforms/IPO/Inliner.cpp Sat Mar 31 07:42:41 2012
> @@ -231,14 +231,10 @@
> return false;
> }
>
> - int Cost = IC.getValue();
> Function *Caller = CS.getCaller();
> - int CurrentThreshold = getInlineThreshold(CS);
> - float FudgeFactor = getInlineFudgeFactor(CS);
> - int AdjThreshold = (int)(CurrentThreshold * FudgeFactor);
> - if (Cost >= AdjThreshold) {
> - DEBUG(dbgs() << " NOT Inlining: cost=" << Cost
> - << ", thres=" << AdjThreshold
> + if (!IC) {
> + DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
> + << ", thres=" << (IC.getCostDelta() + IC.getCost())
> << ", Call: " << *CS.getInstruction() << "\n");
> return false;
> }
> @@ -255,10 +251,15 @@
> // are used. Thus we will always have the opportunity to make local inlining
> // decisions. Importantly the linkonce-ODR linkage covers inline functions
> // and templates in C++.
> + //
> + // FIXME: All of this logic should be sunk into getInlineCost. It relies on
> + // the internal implementation of the inline cost metrics rather than
> + // treating them as truly abstract units etc.
> if (Caller->hasLocalLinkage() ||
> Caller->getLinkage() == GlobalValue::LinkOnceODRLinkage) {
> int TotalSecondaryCost = 0;
> - bool outerCallsFound = false;
> + // The candidate cost to be imposed upon the current function.
> + int CandidateCost = IC.getCost() - (InlineConstants::CallPenalty + 1);
> // This bool tracks what happens if we do NOT inline C into B.
> bool callerWillBeRemoved = Caller->hasLocalLinkage();
> // This bool tracks what happens if we DO inline C into B.
> @@ -276,26 +277,19 @@
> }
>
> InlineCost IC2 = getInlineCost(CS2);
> - if (IC2.isNever())
> + if (!IC2) {
> callerWillBeRemoved = false;
> - if (IC2.isAlways() || IC2.isNever())
> + continue;
> + }
> + if (IC2.isAlways())
> continue;
>
> - outerCallsFound = true;
> - int Cost2 = IC2.getValue();
> - int CurrentThreshold2 = getInlineThreshold(CS2);
> - float FudgeFactor2 = getInlineFudgeFactor(CS2);
> -
> - if (Cost2 >= (int)(CurrentThreshold2 * FudgeFactor2))
> - callerWillBeRemoved = false;
> -
> - // See if we have this case. We subtract off the penalty
> - // for the call instruction, which we would be deleting.
> - if (Cost2 < (int)(CurrentThreshold2 * FudgeFactor2) &&
> - Cost2 + Cost - (InlineConstants::CallPenalty + 1) >=
> - (int)(CurrentThreshold2 * FudgeFactor2)) {
> + // See if inlining or original callsite would erase the cost delta of
> + // this callsite. We subtract off the penalty for the call instruction,
> + // which we would be deleting.
> + if (IC2.getCostDelta() <= CandidateCost) {
> inliningPreventsSomeOuterInline = true;
> - TotalSecondaryCost += Cost2;
> + TotalSecondaryCost += IC2.getCost();
> }
> }
> // If all outer calls to Caller would get inlined, the cost for the last
> @@ -305,17 +299,16 @@
> if (callerWillBeRemoved && Caller->use_begin() != Caller->use_end())
> TotalSecondaryCost += InlineConstants::LastCallToStaticBonus;
>
> - if (outerCallsFound && inliningPreventsSomeOuterInline &&
> - TotalSecondaryCost < Cost) {
> - DEBUG(dbgs() << " NOT Inlining: " << *CS.getInstruction() <<
> - " Cost = " << Cost <<
> + if (inliningPreventsSomeOuterInline && TotalSecondaryCost < IC.getCost()) {
> + DEBUG(dbgs() << " NOT Inlining: " << *CS.getInstruction() <<
> + " Cost = " << IC.getCost() <<
> ", outer Cost = " << TotalSecondaryCost << '\n');
> return false;
> }
> }
>
> - DEBUG(dbgs() << " Inlining: cost=" << Cost
> - << ", thres=" << AdjThreshold
> + DEBUG(dbgs() << " Inlining: cost=" << IC.getCost()
> + << ", thres=" << (IC.getCostDelta() + IC.getCost())
> << ", Call: " << *CS.getInstruction() << '\n');
> return true;
> }
>
> Modified: llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/alloca-bonus.ll?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/alloca-bonus.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/alloca-bonus.ll Sat Mar 31 07:42:41 2012
> @@ -1,5 +1,7 @@
> ; RUN: opt -inline < %s -S -o - -inline-threshold=8 | FileCheck %s
>
> +target datalayout = "p:32:32"
> +
> declare void @llvm.lifetime.start(i64 %size, i8* nocapture %ptr)
>
> @glbl = external global i32
> @@ -15,8 +17,8 @@
> define void @inner1(i32 *%ptr) {
> %A = load i32* %ptr
> store i32 0, i32* %ptr
> - %C = getelementptr i32* %ptr, i32 0
> - %D = getelementptr i32* %ptr, i32 1
> + %C = getelementptr inbounds i32* %ptr, i32 0
> + %D = getelementptr inbounds i32* %ptr, i32 1
> %E = bitcast i32* %ptr to i8*
> %F = select i1 false, i32* %ptr, i32* @glbl
> call void @llvm.lifetime.start(i64 0, i8* %E)
> @@ -35,8 +37,8 @@
> define void @inner2(i32 *%ptr) {
> %A = load i32* %ptr
> store i32 0, i32* %ptr
> - %C = getelementptr i32* %ptr, i32 0
> - %D = getelementptr i32* %ptr, i32 %A
> + %C = getelementptr inbounds i32* %ptr, i32 0
> + %D = getelementptr inbounds i32* %ptr, i32 %A
> %E = bitcast i32* %ptr to i8*
> %F = select i1 false, i32* %ptr, i32* @glbl
> call void @llvm.lifetime.start(i64 0, i8* %E)
> @@ -93,7 +95,7 @@
> ; %B poisons this call, scalar-repl can't handle that instruction. However, we
> ; still want to detect that the icmp and branch *can* be handled.
> define void @inner4(i32 *%ptr, i32 %A) {
> - %B = getelementptr i32* %ptr, i32 %A
> + %B = getelementptr inbounds i32* %ptr, i32 %A
> %C = icmp eq i32* %ptr, null
> br i1 %C, label %bb.true, label %bb.false
> bb.true:
> @@ -122,3 +124,32 @@
> bb.false:
> ret void
> }
> +
> +define void @outer5() {
> +; CHECK: @outer5
> +; CHECK-NOT: call void @inner5
> + %ptr = alloca i32
> + call void @inner5(i1 false, i32* %ptr)
> + ret void
> +}
> +
> +; %D poisons this call, scalar-repl can't handle that instruction. However, if
> +; the flag is set appropriately, the poisoning instruction is inside of dead
> +; code, and so shouldn't be counted.
> +define void @inner5(i1 %flag, i32 *%ptr) {
> + %A = load i32* %ptr
> + store i32 0, i32* %ptr
> + %C = getelementptr inbounds i32* %ptr, i32 0
> + br i1 %flag, label %if.then, label %exit
> +
> +if.then:
> + %D = getelementptr inbounds i32* %ptr, i32 %A
> + %E = bitcast i32* %ptr to i8*
> + %F = select i1 false, i32* %ptr, i32* @glbl
> + call void @llvm.lifetime.start(i64 0, i8* %E)
> + ret void
> +
> +exit:
> + ret void
> +}
> +
>
> Modified: llvm/trunk/test/Transforms/Inline/dynamic_alloca_test.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/dynamic_alloca_test.ll?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/dynamic_alloca_test.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/dynamic_alloca_test.ll Sat Mar 31 07:42:41 2012
> @@ -4,6 +4,11 @@
> ; already have dynamic allocas.
>
> ; RUN: opt < %s -inline -S | FileCheck %s
> +;
> +; FIXME: This test is xfailed because the inline cost rewrite disabled *all*
> +; inlining of functions which contain a dynamic alloca. It should be re-enabled
> +; once that functionality is restored.
> +; XFAIL: *
>
> declare void @ext(i32*)
>
>
> Modified: llvm/trunk/test/Transforms/Inline/inline_constprop.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline_constprop.ll?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/inline_constprop.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/inline_constprop.ll Sat Mar 31 07:42:41 2012
> @@ -1,4 +1,4 @@
> -; RUN: opt < %s -inline -S | FileCheck %s
> +; RUN: opt < %s -inline -inline-threshold=20 -S | FileCheck %s
>
> define internal i32 @callee1(i32 %A, i32 %B) {
> %C = sdiv i32 %A, %B
> @@ -14,17 +14,18 @@
> }
>
> define i32 @caller2() {
> +; Check that we can constant-prop through instructions after inlining callee21
> +; to get constants in the inlined callsite to callee22.
> +; FIXME: Currently, the threshold is fixed at 20 because we don't perform
> +; *recursive* cost analysis to realize that the nested call site will definitely
> +; inline and be cheap. We should eventually do that and lower the threshold here
> +; to 1.
> +;
> ; CHECK: @caller2
> ; CHECK-NOT: call void @callee2
> ; CHECK: ret
>
> -; We contrive to make this hard for *just* the inline pass to do in order to
> -; simulate what can actually happen with large, complex functions getting
> -; inlined.
> - %a = add i32 42, 0
> - %b = add i32 48, 0
> -
> - %x = call i32 @callee21(i32 %a, i32 %b)
> + %x = call i32 @callee21(i32 42, i32 48)
> ret i32 %x
> }
>
> @@ -41,49 +42,71 @@
> br i1 %icmp, label %bb.true, label %bb.false
> bb.true:
> ; This block musn't be counted in the inline cost.
> - %ptr = call i8* @getptr()
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> - load volatile i8* %ptr
> + %x1 = add i32 %x, 1
> + %x2 = add i32 %x1, 1
> + %x3 = add i32 %x2, 1
> + %x4 = add i32 %x3, 1
> + %x5 = add i32 %x4, 1
> + %x6 = add i32 %x5, 1
> + %x7 = add i32 %x6, 1
> + %x8 = add i32 %x7, 1
>
> - ret i32 %x
> + ret i32 %x8
> bb.false:
> ret i32 %x
> }
> +
> +define i32 @caller3() {
> +; Check that even if the expensive path is hidden behind several basic blocks,
> +; it doesn't count toward the inline cost when constant-prop proves those paths
> +; dead.
> +;
> +; CHECK: @caller3
> +; CHECK-NOT: call
> +; CHECK: ret i32 6
> +
> +entry:
> + %x = call i32 @callee3(i32 42, i32 48)
> + ret i32 %x
> +}
> +
> +define i32 @callee3(i32 %x, i32 %y) {
> + %sub = sub i32 %y, %x
> + %icmp = icmp ugt i32 %sub, 42
> + br i1 %icmp, label %bb.true, label %bb.false
> +
> +bb.true:
> + %icmp2 = icmp ult i32 %sub, 64
> + br i1 %icmp2, label %bb.true.true, label %bb.true.false
> +
> +bb.true.true:
> + ; This block musn't be counted in the inline cost.
> + %x1 = add i32 %x, 1
> + %x2 = add i32 %x1, 1
> + %x3 = add i32 %x2, 1
> + %x4 = add i32 %x3, 1
> + %x5 = add i32 %x4, 1
> + %x6 = add i32 %x5, 1
> + %x7 = add i32 %x6, 1
> + %x8 = add i32 %x7, 1
> + br label %bb.merge
> +
> +bb.true.false:
> + ; This block musn't be counted in the inline cost.
> + %y1 = add i32 %y, 1
> + %y2 = add i32 %y1, 1
> + %y3 = add i32 %y2, 1
> + %y4 = add i32 %y3, 1
> + %y5 = add i32 %y4, 1
> + %y6 = add i32 %y5, 1
> + %y7 = add i32 %y6, 1
> + %y8 = add i32 %y7, 1
> + br label %bb.merge
> +
> +bb.merge:
> + %result = phi i32 [ %x8, %bb.true.true ], [ %y8, %bb.true.false ]
> + ret i32 %result
> +
> +bb.false:
> + ret i32 %sub
> +}
>
> Modified: llvm/trunk/test/Transforms/Inline/noinline-recursive-fn.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/noinline-recursive-fn.ll?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/noinline-recursive-fn.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/noinline-recursive-fn.ll Sat Mar 31 07:42:41 2012
> @@ -71,3 +71,40 @@
> call void @f2(i32 123, i8* bitcast (void (i32, i8*, i8*)* @f1 to i8*), i8* bitcast (void (i32, i8*, i8*)* @f2 to i8*)) nounwind ssp
> ret void
> }
> +
> +
> +; Check that a recursive function, when called with a constant that makes the
> +; recursive path dead code can actually be inlined.
> +define i32 @fib(i32 %i) {
> +entry:
> + %is.zero = icmp eq i32 %i, 0
> + br i1 %is.zero, label %zero.then, label %zero.else
> +
> +zero.then:
> + ret i32 0
> +
> +zero.else:
> + %is.one = icmp eq i32 %i, 1
> + br i1 %is.one, label %one.then, label %one.else
> +
> +one.then:
> + ret i32 1
> +
> +one.else:
> + %i1 = sub i32 %i, 1
> + %f1 = call i32 @fib(i32 %i1)
> + %i2 = sub i32 %i, 2
> + %f2 = call i32 @fib(i32 %i2)
> + %f = add i32 %f1, %f2
> + ret i32 %f
> +}
> +
> +define i32 @fib_caller() {
> +; CHECK: @fib_caller
> +; CHECK-NOT: call
> +; CHECK: ret
> + %f1 = call i32 @fib(i32 0)
> + %f2 = call i32 @fib(i32 1)
> + %result = add i32 %f1, %f2
> + ret i32 %result
> +}
>
> Modified: llvm/trunk/test/Transforms/Inline/ptr-diff.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/ptr-diff.ll?rev=153812&r1=153811&r2=153812&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/Inline/ptr-diff.ll (original)
> +++ llvm/trunk/test/Transforms/Inline/ptr-diff.ll Sat Mar 31 07:42:41 2012
> @@ -1,5 +1,7 @@
> ; RUN: opt -inline < %s -S -o - -inline-threshold=10 | FileCheck %s
>
> +target datalayout = "p:32:32"
> +
> define i32 @outer1() {
> ; CHECK: @outer1
> ; CHECK-NOT: call
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-David
More information about the llvm-commits
mailing list