[PATCH] D69732: [WIP][LTO] Apply SamplePGO pipeline tunes for ThinLTO pre-link to full LTO

Mon Nov 4 18:29:29 PST 2019

wristow added a comment.

> This probably needs to be taken over by someone who cares about full LTO performance

We at PlayStation are definitely interested in full LTO performance, so we're looking into this.  We certainly agree with the rationale that if suppressing some optimizations is useful to allow better SamplePGO matching, then we'd expect that would apply equally to both ThinLTO and full LTO.

I guess much of this comes down to a balancing act between:

1. The amount of the runtime benefit with Sample PGO if these loop optimizations are deferred to the full LTO back-end (like they are for ThinLTO).
2. The cost in compile-time resources in the full LTO back-end to do these loop optimizations at that later stage.

>From the discussion here, the Sample PGO runtime win (point 1) seems more or less to be a given.  If we find the compile-time cost in the full LTO back-end (point 2) is not significant, then the decision should be easy.  So after seeing this patch, we're doing some experiments to at least try to get a handle on this.  (I'm a bit concerned we won't be able to draw any hard conclusions from the results of our experiments, but at least we'll be able to make a better informed assessment.)

FTR, for PlayStation, we're using the old PM.  But we'll do some experiments for both the old and new PM, to get a sense of the answers to the (old PM) `LoopUnrollAndJam` point, and the (new PM) FIXME comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69732/new/

https://reviews.llvm.org/D69732