[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Mon Jun 24 12:32:55 PDT 2013

----- Original Message -----
> Hi,
> 
> I wanted to start a discussion about the following issue since I am
> not sure about what to do about it:
> 
> The loop-vectorizer  has the potential to make code a quite a bit
> bigger (esp. in cases where we don’t know the loop count or whether
> pointers alias).
> Chandler has observed this in snappy where we have a simple memory
> copying loop (that may overlap).
> 
> We vectorize this loop and then this loop gets inlined into a
> function and prevents this function from getting inlined again.
> Causing a significant(?) degradation.
> 
> https://code.google.com/p/snappy/source/browse/trunk/snappy.cc#99
> 
> We have seen some good performance benefits from vectorizing such
> loops. So not vectorizing them is not really a good option I think.
> 
> In
> <http://llvm.org/viewvc/llvm-project?view=revision&revision=184698>
> Chandler introduced a flag so that we can run the vectorizer after
> all CG passes. This would prevent the inline from seeing the
> vectorized code.

There are obviously several issues here, and they seem only loosely related. Regarding this first one, why is the right answer not to adjust (or improve) the inlining heuristic? I understand that this is not easy, but the fact remains that, in the end, having the loop inlined, even with the extra vectorization checks, is what should be happening (or is the performance still worse than the non-vectorized code?). If we really feel that we can't adjust the current heuristic without breaking other things, then we could add some metadata to make the cost estimator ignore the vector loop preheader, but I'd prefer adjusting the inlining thresholds, etc. The commit message for r184698 said that the flag was for experimentation purposes, and I think that's fine, but this should not be the solution unless it really produces better non-inlined code as well.

> 
> I see some potential issues:
> 
> * We run a loop pass later again with the associated compile time
> cost (?)
> 
> * The vectorizer relies on cleanup passes to run afterwards: dce,
> instsimplify, simplifycfg, maybe some form of redundancy elimination
>   If we run later we have to run those passes again increasing
>   compile time OR
>   We have to duplicate them in the loop vectorizer increasing its
>   complexity
> 
> * The vectorizer would like SCEV analysis to be as precise as
> possible: one reason are dependency checks that want to know that
> expressions cannot wrap (AddRec expressions to be more specific):
>   At the moment, indvars will remove those flags in some cases which
>   currently is not a problem because SCEV analysis still has the old
>   values cached (except in the case that Chandler mention to me on
>   IRC where we inline a function - in which case that info is lost).
>   My understanding of this is that this is not really something we
>   can fix easily because of the way that SCEV works
>   (unique-ifying/commoning expressions and thereby dropping the
>   flags).

I assume that we're taking about nsw, etc. The fact that SCEV assumes nsw in some cases has led to problems (PR16130 has some history on this), and I don't understand why SCEV's unique-ifying/commoning expressions implies that it needs to drop the flags. Maybe this is just a matter of someone needing to do the work? Is it clear whether (i + 5 == i +(nsw) 5) should always be true, always false, or it depends on how the caller wants to use the answer?

>   A potential solution would be to move indvars to later. The
>   question is do other loop passes which simplify IR depend on
>   indvars? Andy what is your take on this?
> 
> The benefit of vectorizing later is that we would have more context
> at the inlined call site. 

Is it clear that this additional context is a win? Some simplifications make loops easier to understand, and some make them harder to understand. The best thing (not for compile time) may be to run the vectorizer in both places. Nevertheless, now that we can experiment with this, it will be interesting to see some statistics.

Thanks again,
Hal

> And it would solve the problem of the
> inliner seeing vectorized code.
> 
> What do you all think?
> 
> 
> On Jun 24, 2013, at 2:21 AM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
> 
> > Adding this based on a discussion with Arnold and it seems at least
> > worth having this flag for us to both run some experiments to see
> > if
> > this strategy is workable. It may solve some of the regressions
> > seen
> > with the loop vectorizer.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory