SLP/Loop vectorizer pass ordering
chandlerc at google.com
Fri Jul 25 19:05:34 PDT 2014
On Fri, Jul 25, 2014 at 8:56 AM, Arnold Schwaighofer <
aschwaighofer at apple.com> wrote:
> The loop vectorizer runs *once* in our pass pipeline (actually twice, a
> second time in LTO). It is outside the inliner SCC pass manager.
> The SLP vectorizer runs as part of the inliner SCC pass manager. This is
> done consciously. We want to have noalias attributes available that
> disappears (until recently - Hal will/has fixed this) during inlining.
> There are know regressions - most notably phoronix Cray benchmark (if
> memory serves right). Because of bad interaction with SROA.
> I wanted to move the SLP vectorizer out of the SCC pass (or at least
> evaluate doing so) once the noalias deficiency is fixed.
> My feeling is that we should move the SLPVectorizer after the loop
> vectorizer when doing this because loop vectorization is more powerful
> (it vectorizes by a factor greater than the parallelism available in a
> basic block).
Generally, I support doing all vectorization after the inliner and SCC
passes have run and in a way that lets the most important vectorizers have
the first chance to run.
However, some more history here. When Nadav and I talked about this ages
ago (I was badgering him to move the loop vectorizer out of the SCC pass
manager), there was also some argument that the SLP-vectorized form was
"canonical" in some ways, but while I bought that at the time, I agree with
it less and less.
The crux of the problem here is that very good SLP vectorization can reduce
the number of instructions in a function body by a large factor (4x lets
say). When that happens, the function might go from over the inline
threshold to under the inline threshold. It would be nice for the inliner
to have visibility into how the vectorizer will be able to pack the code of
a function into vectors so it can more accurately estimate the size cost it
trades off against.
But these days I think this isn't actually either important or the right
thing to measure. We should care more about the simplifications and core
complexity of the function, and the pre-SLP-vectorization code is probably
a better proxy for that than the post-SLP-vectorization code. I'm also not
seeing the kind of code-shrink from the SLP-vectorizer that would make a
huge difference for the inliner except for tiny test cases that would be
easily inlined either way.
Anyways, carry on, I look forward to the AA problem being resolved so that
we can do this late and keep the code earlier looking nice and easily
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits