[PATCH] D102002: [PassManager] unify vector passes between regular and LTO pipelines

Fri May 7 04:12:37 PDT 2021

spatel added a comment.

In D102002#2742902 <https://reviews.llvm.org/D102002#2742902>, @dmgreen wrote:

> Without a lot of evidence that this is better across a range of architectures, I would recommend trying to be conservative, taking things a step at a time.
> We have one test where this patch would make it 88% worse. That might be a pathological case but no one is going to put up with on tenth of the speed :-)

Agreed - I should try to move the LTO variations to be more like the regular pipeline in small steps. 
I'm curious if we can already see the big slowdown when compiling with "-flto" then? 
Similarly for the rawspeed benchmark shown by @lebedev.ri - does the reduction in SLP vectorization show up when compiling with -flto independently of this patch?

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1199
+
+  // The vectorizer may have significantly shortened a loop body; unroll again.
+  // Unroll small loops to hide loop backedge latency and saturate any parallel
----------------
dmgreen wrote:
> I'm not sure what this comment is trying to say exactly. I think it's coming from somewhere that is very old and out of date now.
> 
> My very high level understand of the pass pipeline, at least for non-LTO in terms of loops is that we do:
>  - Cleanup clang code, code simplification, inlining, DSE, GVN, all that goodness.
>  - Run loop optimizations like licm, idiom recognition, loop deletion. And including _full_ unrolling.
>  - Other optimizations.
>  - Run Vectorizer
>  - Run SLP
>  - _Runtime_ Unroll loops.
> 
> There are some extra simplifications that need to happen in between too. The last unrolling, especially on smaller inorder cores has nothing really to do with vectorization. It's done near the end of the pipeline because the runtime unrolling isn't expected to be helpful to anything else, but up to that point we have not done runtime unrolling.
That sounds right. I see the call to full unroll at line 606 - inside of buildO1FunctionSimplificationPipeline().

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102002/new/

https://reviews.llvm.org/D102002