SLP/Loop vectorizer pass ordering
aschwaighofer at apple.com
Fri Jul 25 08:56:30 PDT 2014
The loop vectorizer runs *once* in our pass pipeline (actually twice, a second time in LTO). It is outside the inliner SCC pass manager.
The SLP vectorizer runs as part of the inliner SCC pass manager. This is done consciously. We want to have noalias attributes available that disappears (until recently - Hal will/has fixed this) during inlining.
There are know regressions - most notably phoronix Cray benchmark (if memory serves right). Because of bad interaction with SROA.
I wanted to move the SLP vectorizer out of the SCC pass (or at least evaluate doing so) once the noalias deficiency is fixed.
My feeling is that we should move the SLPVectorizer after the loop vectorizer when doing this because loop vectorization is more powerful (it vectorizes by a factor greater than the parallelism available in a basic block).
> On Jul 25, 2014, at 8:41 AM, James Molloy <james.molloy at arm.com> wrote:
> Hi Nadav, Arnold,
> I’ve come across an interesting optimization problem in one of the SPEC benchmarks. There is a loop that can be optimized by both the SLP vectorizer and the loop vectorizer (when I patch the loop vectorizer to deal with fsub reductions).
> The SLP vectorizer actually makes the performance worse – I think this is due to a lack of loop unrolling afterwards. The Loop vectorizer can improve the performance.
> However, the loop vectorizer runs after the SLP vectorizer, so it never gets a chance. I’d have thought the ideal order would be Loop Vectorizer -> SLP vectorizer -> BB vectorizer, given that the loop vectorizer if it can run will probably give greater speedup than SLP.
> The current sequence is SLP vectorizer -> BB vectorizer -> Loop vectorizer.
> What are your thoughts on this?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits