[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Mon Jun 24 14:16:41 PDT 2013

----- Original Message -----
> 
> 
> 
> 
> 
> On Mon, Jun 24, 2013 at 11:01 AM, Arnold Schwaighofer <
> aschwaighofer at apple.com > wrote:
> 
> 
> 
> In <
> http://llvm.org/viewvc/llvm-project?view=revision&revision=184698 >
> Chandler introduced a flag so that we can run the vectorizer after
> all CG passes. This would prevent the inline from seeing the
> vectorized code.
> 
> 
> 
> Just for the record, I have no real expectation that this is a good
> idea yet... But it's hard to collect numbers without a flag of some
> kind, and it's also really annoying to craft this flag given the
> current pass manager, so I figured I would get a skeleton in place
> that folks could experiment with, and we could keep or delete based
> on this discussion and any numbers.
> 
> 
> 
> 
> I see some potential issues:
> 
> * We run a loop pass later again with the associated compile time
> cost (?)
> 
> 
> 
> This actually worries me the least -- the tradeoff is between
> locality and repeatedly analyzing the same loops during the highly
> iterative CGSCC pass manager (which looks at each function up to 4
> times, and may look at the body of a function which gets inlined
> many more times).
> 
> 
> 
> * The vectorizer relies on cleanup passes to run afterwards: dce,
> instsimplify, simplifycfg, maybe some form of redundancy elimination
> If we run later we have to run those passes again increasing compile
> time OR
> We have to duplicate them in the loop vectorizer increasing its
> complexity
> 
> 
> 
> Hopefully, these won't be too expensive. instcombine and simplifycfg
> shouldn't be this late in the pipeline, but only numbers will really
> tell. If we need redundancy elimination in the form of GVN, then
> we're in a lot of trouble, but that doesn't seem likely to be
> necessary (I hope).

Obviously there may be relevant differences, but running EarlyCSE after the BB vectorizer turned out to work almost as well as running GVN, and the improved compile time seemed worth the trade off.

> 
> 
> 
> * The vectorizer would like SCEV analysis to be as precise as
> possible: one reason are dependency checks that want to know that
> expressions cannot wrap (AddRec expressions to be more specific):
> At the moment, indvars will remove those flags in some cases which
> currently is not a problem because SCEV analysis still has the old
> values cached (except in the case that Chandler mention to me on IRC
> where we inline a function - in which case that info is lost). My
> understanding of this is that this is not really something we can
> fix easily because of the way that SCEV works
> (unique-ifying/commoning expressions and thereby dropping the
> flags).
> A potential solution would be to move indvars to later. The question
> is do other loop passes which simplify IR depend on indvars? Andy
> what is your take on this?
> 
> 
> 
> I think this is far and away the most important question to answer.
> =] I think there are lots of things that would be improved by
> preserving SCEVs precision throughout the CGSCC pass manager, but I
> have no idea how feasible that is. I would also appreciate Dan's
> insights here.
> 
> 
> 
> The benefit of vectorizing later is that we would have more context
> at the inlined call site. And it would solve the problem of the
> inliner seeing vectorized code.
> 
> 
> 
> There's more though in theory (although maybe not in practice today,
> and I may just be wrong on some of these):
> - It would improve the ability of the vectorizer to reason about
> icache impact, because it wouldn't think the loop was the only loop
> in the function only to have it be inlined in 10 places.

Good point (although might only apply in practice to cases where we know that the trip count is small, and that requires profiling data in general).

> - It may form a vectorized loop before inlining which would be
> handled better by loop idiom recognition after inlining.

I imagine that we could improve idiom recognition to mitigate this particular issue.

> - It would prevent turning loops which SCEV-based passes can simply
> compute and/or delete *after* inlining adds some context into a
> vectorized loop that may be significantly harder to analyze at this
> level.

In my experience (from working with the BB vectorizer), this can be a significant problem. Even worse, if you vectorize any of the calculations feeding addressing, then BasicAA also becomes less precise.

> 
> 
> (The last one is the most speculative to me -- it could be that I'm
> wrong and SCEV and other loop analyses will work just as well on the
> vectorized loops...)
> 
> 
> All of these share a common thread: the vectorizer somewhat
> inherently loses information, and thus doing it during the iterative
> pass manager is very risky as later iterations may be hampered by
> it.

This is an infrastructure problem, but I suspect it will remain true without a significant effort to teach SCEV, BasicAA, etc. to look though vectorized computations.

 -Hal

> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory