[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
Andrew Trick
atrick at apple.com
Mon Jun 24 15:55:19 PDT 2013
On Jun 24, 2013, at 11:01 AM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> Hi,
>
> I wanted to start a discussion about the following issue since I am not sure about what to do about it:
>
> The loop-vectorizer has the potential to make code a quite a bit bigger (esp. in cases where we don’t know the loop count or whether pointers alias).
> Chandler has observed this in snappy where we have a simple memory copying loop (that may overlap).
>
> We vectorize this loop and then this loop gets inlined into a function and prevents this function from getting inlined again. Causing a significant(?) degradation.
>
> https://code.google.com/p/snappy/source/browse/trunk/snappy.cc#99
>
> We have seen some good performance benefits from vectorizing such loops. So not vectorizing them is not really a good option I think.
>
> In <http://llvm.org/viewvc/llvm-project?view=revision&revision=184698> Chandler introduced a flag so that we can run the vectorizer after all CG passes. This would prevent the inline from seeing the vectorized code.
>
> I see some potential issues:
>
> * We run a loop pass later again with the associated compile time cost (?)
I want to move loop opts that depend on target info later, outside of CGSCC: definitely indvars, vectorize/partial unroll. That way we only inline canonical code and have a clean break between canonical and lowering passes. Hopefully inlining heuristics will be adequate without first running these passes. For the most part, I think it's as simple as inlining first with high-level heuristics, then lowering later.
> * The vectorizer relies on cleanup passes to run afterwards: dce, instsimplify, simplifycfg, maybe some form of redundancy elimination
> If we run later we have to run those passes again increasing compile time OR
> We have to duplicate them in the loop vectorizer increasing its complexity
We'll have to handle this case-by-case as we gradually move passes around. But the general idea is that lowering passes like the vectorizer should clean up after themselves as much as feasible (whereas canonicalization passes should not need to). We should be developing utilities to cleanup redundancies incrementally. A value number utility would make sense. Of course, if a very light-weight pass can simply be rescheduled to run again to do the cleanup then we don't need a cleanup utility.
> * The vectorizer would like SCEV analysis to be as precise as possible: one reason are dependency checks that want to know that expressions cannot wrap (AddRec expressions to be more specific):
> At the moment, indvars will remove those flags in some cases which currently is not a problem because SCEV analysis still has the old values cached (except in the case that Chandler mention to me on IRC where we inline a function - in which case that info is lost). My understanding of this is that this is not really something we can fix easily because of the way that SCEV works (unique-ifying/commoning expressions and thereby dropping the flags).
> A potential solution would be to move indvars to later. The question is do other loop passes which simplify IR depend on indvars? Andy what is your take on this?
Indvars should ideally preserve NSW flags whenever possible. However, we don't want to rely on SCEV to preserve them. SCEV expressions are implicitly reassociated and uniqued in a flow-insensitive universe independent of the def-use chain of values. SCEV simply can't represent the flags in most cases. I think the only flag that makes sense in SCEV is the no-wrap flag on a recurrence (that's independent of signed/unsigned overflow).
As long as indvars does not rely on SCEVExpander it should be able to preserve the flags. Unfortunately, it still uses SCEVExpander in a few places. LinearFunctionTestReplace is one that should simply be moved into LSR instead. For the couple other cases, we'll just have to work on alternative implementations that don't drop flags, but I think it's worth doing.
That said, we should try not to rely on NSW at all unless clearly necessary. It introduces nasty complexity that needs to be well justified. e.g. in the vectorized loop preheader we should explicitly check for wrapping and only try to optimize those checks using NSW if we have data that indicates it's really necessary.
-Andy
>
> The benefit of vectorizing later is that we would have more context at the inlined call site. And it would solve the problem of the inliner seeing vectorized code.
>
> What do you all think?
>
>
> On Jun 24, 2013, at 2:21 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
>
>> Adding this based on a discussion with Arnold and it seems at least
>> worth having this flag for us to both run some experiments to see if
>> this strategy is workable. It may solve some of the regressions seen
>> with the loop vectorizer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130624/fe65128e/attachment.html>
More information about the llvm-dev
mailing list