[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Mon Jun 24 11:01:25 PDT 2013

Hi,

I wanted to start a discussion about the following issue since I am not sure about what to do about it:

The loop-vectorizer  has the potential to make code a quite a bit bigger (esp. in cases where we don’t know the loop count or whether pointers alias).
Chandler has observed this in snappy where we have a simple memory copying loop (that may overlap).

We vectorize this loop and then this loop gets inlined into a function and prevents this function from getting inlined again. Causing a significant(?) degradation.

https://code.google.com/p/snappy/source/browse/trunk/snappy.cc#99

We have seen some good performance benefits from vectorizing such loops. So not vectorizing them is not really a good option I think.

In <http://llvm.org/viewvc/llvm-project?view=revision&revision=184698> Chandler introduced a flag so that we can run the vectorizer after all CG passes. This would prevent the inline from seeing the vectorized code.

I see some potential issues:

* We run a loop pass later again with the associated compile time cost (?)

* The vectorizer relies on cleanup passes to run afterwards: dce, instsimplify, simplifycfg, maybe some form of redundancy elimination
  If we run later we have to run those passes again increasing compile time OR
  We have to duplicate them in the loop vectorizer increasing its complexity

* The vectorizer would like SCEV analysis to be as precise as possible: one reason are dependency checks that want to know that expressions cannot wrap (AddRec expressions to be more specific):
  At the moment, indvars will remove those flags in some cases which currently is not a problem because SCEV analysis still has the old values cached (except in the case that Chandler mention to me on IRC where we inline a function - in which case that info is lost). My understanding of this is that this is not really something we can fix easily because of the way that SCEV works (unique-ifying/commoning expressions and thereby dropping the flags).
  A potential solution would be to move indvars to later. The question is do other loop passes which simplify IR depend on indvars? Andy what is your take on this?

The benefit of vectorizing later is that we would have more context at the inlined call site. And it would solve the problem of the inliner seeing vectorized code.

What do you all think?

On Jun 24, 2013, at 2:21 AM, Chandler Carruth <chandlerc at gmail.com> wrote:

> Adding this based on a discussion with Arnold and it seems at least
> worth having this flag for us to both run some experiments to see if
> this strategy is workable. It may solve some of the regressions seen
> with the loop vectorizer.