[llvm] r200219 - [vectorize] Initial version of respecting PGO in the vectorizer: treat
David Blaikie
dblaikie at gmail.com
Mon Jan 27 09:46:50 PST 2014
On Mon, Jan 27, 2014 at 5:11 AM, Chandler Carruth <chandlerc at gmail.com>wrote:
> Author: chandlerc
> Date: Mon Jan 27 07:11:50 2014
> New Revision: 200219
>
> URL: http://llvm.org/viewvc/llvm-project?rev=200219&view=rev
> Log:
> [vectorize] Initial version of respecting PGO in the vectorizer: treat
> cold loops as-if they were being optimized for size.
>
> Nothing fancy here. Simply test case included. The nice thing is that we
> can now incrementally build on top of this to drive other heuristics.
> All of the infrastructure work is done to get the profile information
> into this layer.
>
> The remaining work necessary to make this a fully general purpose loop
> unroller for very hot loops is to make it a fully general purpose loop
> unroller. Things I know of but am not going to have time to benchmark
> and fix in the immediate future:
>
> 1) Don't disable the entire pass when the target is lacking vector
> registers. This really doesn't make any sense any more.
> 2) Teach the unroller at least and the vectorizer potentially to handle
> non-if-converted loops. This is trivial for the unroller but hard for
> the vectorizer.
> 3) Compute the relative hotness of the loop and thread that down to the
> various places that make cost tradeoffs (very likely only the
> unroller makes sense here, and then only when dealing with loops that
> are small enough for unrolling to not completely blow out the LSD).
>
> I'm still dubious how useful hotness information will be. So far, my
> experiments show that if we can get the correct logic for determining
> when unrolling actually helps performance, the code size impact is
> completely unimportant and we can unroll in all cases. But at least
> we'll no longer burn code size on cold code.
>
> One somewhat unrelated idea that I've had forever but not had time to
> implement: mark all functions which are only reachable via the global
> constructors rigging in the module as optsize.
Just idle curiosity - but wouldn't that be likely to hurt startup time
which seems important to some people/programs?
> This would also decrease
> the impact of any more aggressive heuristics here on code size.
>
> Modified:
> llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> llvm/trunk/test/Transforms/LoopVectorize/X86/small-size.ll
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp?rev=200219&r1=200218&r2=200219&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp Mon Jan 27
> 07:11:50 2014
> @@ -56,6 +56,7 @@
> #include "llvm/ADT/SmallVector.h"
> #include "llvm/ADT/StringExtras.h"
> #include "llvm/Analysis/AliasAnalysis.h"
> +#include "llvm/Analysis/BlockFrequencyInfo.h"
> #include "llvm/Analysis/LoopInfo.h"
> #include "llvm/Analysis/LoopIterator.h"
> #include "llvm/Analysis/LoopPass.h"
> @@ -78,6 +79,7 @@
> #include "llvm/IR/Value.h"
> #include "llvm/IR/Verifier.h"
> #include "llvm/Pass.h"
> +#include "llvm/Support/BranchProbability.h"
> #include "llvm/Support/CommandLine.h"
> #include "llvm/Support/Debug.h"
> #include "llvm/Support/PatternMatch.h"
> @@ -980,18 +982,27 @@ struct LoopVectorize : public FunctionPa
> LoopInfo *LI;
> TargetTransformInfo *TTI;
> DominatorTree *DT;
> + BlockFrequencyInfo *BFI;
> TargetLibraryInfo *TLI;
> bool DisableUnrolling;
> bool AlwaysVectorize;
>
> + BlockFrequency ColdEntryFreq;
> +
> virtual bool runOnFunction(Function &F) {
> SE = &getAnalysis<ScalarEvolution>();
> DL = getAnalysisIfAvailable<DataLayout>();
> LI = &getAnalysis<LoopInfo>();
> TTI = &getAnalysis<TargetTransformInfo>();
> DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
> + BFI = &getAnalysis<BlockFrequencyInfo>();
> TLI = getAnalysisIfAvailable<TargetLibraryInfo>();
>
> + // Compute some weights outside of the loop over the loops. Compute
> this
> + // using a BranchProbability to re-use its scaling math.
> + const BranchProbability ColdProb(1, 5); // 20%
> + ColdEntryFreq = BlockFrequency(BFI->getEntryFreq()) * ColdProb;
> +
> // If the target claims to have no vector registers don't attempt
> // vectorization.
> if (!TTI->getNumberOfRegisters(true))
> @@ -1064,6 +1075,13 @@ struct LoopVectorize : public FunctionPa
> bool OptForSize =
> Hints.Force != 1 && F->hasFnAttribute(Attribute::OptimizeForSize);
>
> + // Compute the weighted frequency of this loop being executed and see
> if it
> + // is less than 20% of the function entry baseline frequency. Note
> that we
> + // always have a canonical loop here because we think we *can*
> vectoriez.
> + BlockFrequency LoopEntryFreq =
> BFI->getBlockFreq(L->getLoopPreheader());
> + if (Hints.Force != 1 && LoopEntryFreq < ColdEntryFreq)
> + OptForSize = true;
> +
> // Check the function attributes to see if implicit floats are
> allowed.a
> // FIXME: This check doesn't seem possibly correct -- what if the
> loop is
> // an integer loop and the vector instructions selected are purely
> integer
> @@ -1109,6 +1127,7 @@ struct LoopVectorize : public FunctionPa
> virtual void getAnalysisUsage(AnalysisUsage &AU) const {
> AU.addRequiredID(LoopSimplifyID);
> AU.addRequiredID(LCSSAID);
> + AU.addRequired<BlockFrequencyInfo>();
> AU.addRequired<DominatorTreeWrapperPass>();
> AU.addRequired<LoopInfo>();
> AU.addRequired<ScalarEvolution>();
> @@ -5469,6 +5488,7 @@ char LoopVectorize::ID = 0;
> static const char lv_name[] = "Loop Vectorization";
> INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)
> INITIALIZE_AG_DEPENDENCY(TargetTransformInfo)
> +INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfo)
> INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
> INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
> INITIALIZE_PASS_DEPENDENCY(LCSSA)
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/small-size.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/small-size.ll?rev=200219&r1=200218&r2=200219&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/small-size.ll (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/small-size.ll Mon Jan 27
> 07:11:50 2014
> @@ -115,6 +115,31 @@ define void @example3(i32 %n, i32* noali
> ret void
> }
>
> +; N is unknown, we need a tail. Can't vectorize because the loop is cold.
> +;CHECK-LABEL: @example4(
> +;CHECK-NOT: <4 x i32>
> +;CHECK: ret void
> +define void @example4(i32 %n, i32* noalias nocapture %p, i32* noalias
> nocapture %q) {
> + %1 = icmp eq i32 %n, 0
> + br i1 %1, label %._crit_edge, label %.lr.ph, !prof !0
> +
> +.lr.ph: ; preds = %0, %.lr.ph
> + %.05 = phi i32 [ %2, %.lr.ph ], [ %n, %0 ]
> + %.014 = phi i32* [ %5, %.lr.ph ], [ %p, %0 ]
> + %.023 = phi i32* [ %3, %.lr.ph ], [ %q, %0 ]
> + %2 = add nsw i32 %.05, -1
> + %3 = getelementptr inbounds i32* %.023, i64 1
> + %4 = load i32* %.023, align 16
> + %5 = getelementptr inbounds i32* %.014, i64 1
> + store i32 %4, i32* %.014, align 16
> + %6 = icmp eq i32 %2, 0
> + br i1 %6, label %._crit_edge, label %.lr.ph
> +
> +._crit_edge: ; preds = %.lr.ph, %0
> + ret void
> +}
> +
> +!0 = metadata !{metadata !"branch_weights", i32 64, i32 4}
>
> ; We can't vectorize this one because we need a runtime ptr check.
> ;CHECK-LABEL: @example23(
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140127/a753a948/attachment.html>
More information about the llvm-commits
mailing list