[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Mon Jun 24 16:24:32 PDT 2013

----- Original Message -----
> 
> 
> 
> 
> On Jun 24, 2013, at 11:01 AM, Arnold Schwaighofer <
> aschwaighofer at apple.com > wrote:
> 
> 
> 
> Hi,
> 
> I wanted to start a discussion about the following issue since I am
> not sure about what to do about it:
> 
> The loop-vectorizer has the potential to make code a quite a bit
> bigger (esp. in cases where we don’t know the loop count or whether
> pointers alias).
> Chandler has observed this in snappy where we have a simple memory
> copying loop (that may overlap).
> 
> We vectorize this loop and then this loop gets inlined into a
> function and prevents this function from getting inlined again.
> Causing a significant(?) degradation.
> 
> https://code.google.com/p/snappy/source/browse/trunk/snappy.cc#99
> 
> We have seen some good performance benefits from vectorizing such
> loops. So not vectorizing them is not really a good option I think.
> 
> In <
> http://llvm.org/viewvc/llvm-project?view=revision&revision=184698 >
> Chandler introduced a flag so that we can run the vectorizer after
> all CG passes. This would prevent the inline from seeing the
> vectorized code.
> 
> I see some potential issues:
> 
> * We run a loop pass later again with the associated compile time
> cost (?)
> 
> 
> 
> I want to move loop opts that depend on target info later, outside of
> CGSCC: definitely indvars, vectorize/partial unroll. That way we
> only inline canonical code and have a clean break between canonical
> and lowering passes. Hopefully inlining heuristics will be adequate
> without first running these passes. For the most part, I think it's
> as simple as inlining first with high-level heuristics, then
> lowering later.
> 
> 
> 
> 
> * The vectorizer relies on cleanup passes to run afterwards: dce,
> instsimplify, simplifycfg, maybe some form of redundancy elimination
> If we run later we have to run those passes again increasing compile
> time OR
> We have to duplicate them in the loop vectorizer increasing its
> complexity
> 
> 
> 
> 
> We'll have to handle this case-by-case as we gradually move passes
> around. But the general idea is that lowering passes like the
> vectorizer should clean up after themselves as much as feasible
> (whereas canonicalization passes should not need to). We should be
> developing utilities to cleanup redundancies incrementally. A value
> number utility would make sense. Of course, if a very light-weight
> pass can simply be rescheduled to run again to do the cleanup then
> we don't need a cleanup utility.
> 
> 
> 
> * The vectorizer would like SCEV analysis to be as precise as
> possible: one reason are dependency checks that want to know that
> expressions cannot wrap (AddRec expressions to be more specific):
> At the moment, indvars will remove those flags in some cases which
> currently is not a problem because SCEV analysis still has the old
> values cached (except in the case that Chandler mention to me on IRC
> where we inline a function - in which case that info is lost). My
> understanding of this is that this is not really something we can
> fix easily because of the way that SCEV works
> (unique-ifying/commoning expressions and thereby dropping the
> flags).
> A potential solution would be to move indvars to later. The question
> is do other loop passes which simplify IR depend on indvars? Andy
> what is your take on this?
> 
> 
> 
> Indvars should ideally preserve NSW flags whenever possible. However,
> we don't want to rely on SCEV to preserve them. SCEV expressions are
> implicitly reassociated and uniqued in a flow-insensitive universe
> independent of the def-use chain of values. SCEV simply can't
> represent the flags in most cases. I think the only flag that makes
> sense in SCEV is the no-wrap flag on a recurrence (that's
> independent of signed/unsigned overflow).

Why can't SCEV keep a flow sensitive (effectively per-BB) map of expressions and their original flags (and perhaps cached deduced flags)? It seems like this problem is solvable within SCEV.

 -Hal

> 
> 
> As long as indvars does not rely on SCEVExpander it should be able to
> preserve the flags. Unfortunately, it still uses SCEVExpander in a
> few places. LinearFunctionTestReplace is one that should simply be
> moved into LSR instead. For the couple other cases, we'll just have
> to work on alternative implementations that don't drop flags, but I
> think it's worth doing.
> 
> 
> That said, we should try not to rely on NSW at all unless clearly
> necessary. It introduces nasty complexity that needs to be well
> justified. e.g. in the vectorized loop preheader we should
> explicitly check for wrapping and only try to optimize those checks
> using NSW if we have data that indicates it's really necessary.
> 
> 
> -Andy
> 
> 
> 
> 
> 
> The benefit of vectorizing later is that we would have more context
> at the inlined call site. And it would solve the problem of the
> inliner seeing vectorized code.
> 
> What do you all think?
> 
> 
> On Jun 24, 2013, at 2:21 AM, Chandler Carruth < chandlerc at gmail.com >
> wrote:
> 
> 
> 
> Adding this based on a discussion with Arnold and it seems at least
> worth having this flag for us to both run some experiments to see if
> this strategy is workable. It may solve some of the regressions seen
> with the loop vectorizer.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory