[PATCH] D123803: [WIP][llvm] A Unified LTO Bitcode Frontend

Fri May 19 00:57:50 PDT 2023

mehdi_amini added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1144
+      if (!PTO.UnifiedLTO)
+        MPM.addPass(buildThinLTODefaultPipeline(L, nullptr));
+      else
----------------
nikic wrote:
> tejohnson wrote:
> > tejohnson wrote:
> > > mehdi_amini wrote:
> > > > tejohnson wrote:
> > > > > mehdi_amini wrote:
> > > > > > It is concerning to me that we add one mode different code path / behavior to maintain instead of unifying everything.
> > > > > > 
> > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > 
> > > > > Perhaps it can eventually, but I would not want to make a major change to the ThinLTO pipelines without a lot of experimentation. I don't personally have the bandwidth to do that right now, but if this was in as an alternative mode under an option, it could be done more easily at some point on a wider range of applications. I'd be concerned for example of side effects on importing behavior which is based on instruction count thresholds.
> > > > Right, but your objection is exactly the root of my concerned with this new mode in the first place right now.
> > > Yeah, it isn't ideal to have added complexity, but I do understand the different constraints. The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > 
> > > I wonder if we can get partly to a more common approach but just have a flag to switch between the different pass managers in the pre and post LTO optimization pipelines. I haven't had a chance to look closely at the patches yet, but my sense is that the other major change is enabling "split" LTO bitcode files always, for which I don't yet have a good understanding of the implications. I'll try to spend some time looking at the patches in more detail in the next few days.
> > Per discussion on the RFC, the unified LTO mode added here requires split thin/regular LTO units. This is not something we have been able to use internally because of the scalability of the regular LTO portions. So we will need to keep the usual "pure" ThinLTO mode operational.
> I feel like I'm missing something here. Why do we need to force the use of the (known-broken, lower quality) full LTO pre-link pipeline here, rather than sticking to the thin LTO pre-link pipeline?
Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).

>  The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.

I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option `-flto=google-pipeline` (or `-Ogoogle` instead of `-O2`). You're getting into "this belongs to your downstream fork" territory IMO.
The point of having a limited set of configuration in-tree is that every user contribute also to the testing of these pipelines. Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123803/new/

https://reviews.llvm.org/D123803