[PATCH] D13443: Rework the LTO Pipeline, aligning closer to the O2/O3 pipeline.

Tue Oct 6 07:05:20 PDT 2015

Hi Mehdi,

Thanks for sharing the results. As you note there are swings in both
directions, but the improvements outweigh the regressions.

On Mon, Oct 5, 2015 at 5:50 PM, Mehdi AMINI <mehdi.amini at apple.com> wrote:
> joker.eph added a comment.
>
> Right now my view of it is that if I get a performance improvement by running two times the inliner and the "peephole" passes, then it is a bug. If it is not a bug it means that the O3 pipeline is affected as well and we might run it two times there as well. Does it make sense?

I wonder if there are aspects of the inliner that work differently
when run twice vs once. E.g. only 1 level of recursive inlining is
allowed currently, but running it twice would allow 2 levels of
recursive inlining. That may not be a big factor, but just an example
where there is going to be a difference running it twice vs once.

Another factor might be that doing the intermediate peephole
optimizations (which are currently run after the compile step
inlining), could be cleaning up the code and reducing some of the
inlining costs for the LTO round of inlining.

For LTO specifically, I wonder how the peak memory usage is affected
(e.g. like we were discussing with the bitcode size, it will see some
larger functions due to the earlier inlining, but also potentially
fewer or smaller functions if the code has been inlined and cleaned up
prior).

>
> I ran the LLVM benchmark suite + some internals with a return before and after the inliner+peephole phase. Stopping before the inliner during the compile phase ends up with 13 regressions and 20 improvements, compared to running the inliner during the compile phase. I sent you some more details by email.

Just to clarify on those results - for the "Previous(1)" which is
stopping after the inlining, are you just removing that early return
from populateModulePassManager? If so, did you put the call to
createEliminateAvailableExternallyPass back under the
if(!PrepareForLTO) guard? There's probably some other stuff like
unrolling and vectorization that as you note would be
counterproductive to run prior to LTO.

Rather than exhaustively find the right combination, a couple of data
points seem particularly interesting: 1) performance effects of just
adding the inlining (and none of the other later opts after your early
return) and exiting right after in the PrepareForLTO case; 2)
performance effects of doing the peephole passes before exiting early
in the PrepareForLTO case (so you get the code cleanup before the LTO
inlining that might be affecting it's cost analysis).

>
>
> http://reviews.llvm.org/D13443
>
>
>

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413