[PATCH] D123803: [WIP][llvm] A Unified LTO Bitcode Frontend

Fri May 19 13:39:24 PDT 2023

nikic added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1144
+      if (!PTO.UnifiedLTO)
+        MPM.addPass(buildThinLTODefaultPipeline(L, nullptr));
+      else
----------------
mehdi_amini wrote:
> tejohnson wrote:
> > mehdi_amini wrote:
> > > nikic wrote:
> > > > tejohnson wrote:
> > > > > tejohnson wrote:
> > > > > > mehdi_amini wrote:
> > > > > > > tejohnson wrote:
> > > > > > > > mehdi_amini wrote:
> > > > > > > > > It is concerning to me that we add one mode different code path / behavior to maintain instead of unifying everything.
> > > > > > > > > 
> > > > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > > > 
> > > > > > > > Perhaps it can eventually, but I would not want to make a major change to the ThinLTO pipelines without a lot of experimentation. I don't personally have the bandwidth to do that right now, but if this was in as an alternative mode under an option, it could be done more easily at some point on a wider range of applications. I'd be concerned for example of side effects on importing behavior which is based on instruction count thresholds.
> > > > > > > Right, but your objection is exactly the root of my concerned with this new mode in the first place right now.
> > > > > > Yeah, it isn't ideal to have added complexity, but I do understand the different constraints. The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > > > > 
> > > > > > I wonder if we can get partly to a more common approach but just have a flag to switch between the different pass managers in the pre and post LTO optimization pipelines. I haven't had a chance to look closely at the patches yet, but my sense is that the other major change is enabling "split" LTO bitcode files always, for which I don't yet have a good understanding of the implications. I'll try to spend some time looking at the patches in more detail in the next few days.
> > > > > Per discussion on the RFC, the unified LTO mode added here requires split thin/regular LTO units. This is not something we have been able to use internally because of the scalability of the regular LTO portions. So we will need to keep the usual "pure" ThinLTO mode operational.
> > > > I feel like I'm missing something here. Why do we need to force the use of the (known-broken, lower quality) full LTO pre-link pipeline here, rather than sticking to the thin LTO pre-link pipeline?
> > > Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> > > 
> > > >  The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > 
> > > I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option `-flto=google-pipeline` (or `-Ogoogle` instead of `-O2`). You're getting into "this belongs to your downstream fork" territory IMO.
> > > The point of having a limited set of configuration in-tree is that every user contribute also to the testing of these pipelines. Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.
> > > 
> > > 
> > > 
> > > 
> > > And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> > 
> > The reverse is also a question - if we are to adopt the full LTO pipeline here, what does it mean for ThinLTO performance (and compile time, given what appears to be a requirement that split modules be used which means that ThinLTO now would be required to include some amount of full LTO)? The current ThinLTO pipeline attempts to maximize performance since we don't have to worry about the full LTO scalability issues.
> > 
> > > I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option -flto=google-pipeline (or -Ogoogle instead of -O2). You're getting into "this belongs to your downstream fork" territory IMO.
> > 
> > Google is not the one asking for a major change to the ThinLTO pipelines, which have been set up roughly this way since inception. While we certainly rely on ThinLTO for performance with scalability, we're also certainly not the only users of ThinLTO. IMO a major change such as this should go in under an experimental option, so that existing users are easily able to try it out, without being expected to patch in multiple patches and do that manually. It will be a lot easier to try it out if this is under an option in the upstream sources.
> > 
> > > Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.
> > 
> > Since Unified LTO is an intermediate between Thin and Full LTO, which have their own pipelines already to balance their different needs, having a different pipeline for a different LTO mode with different needs doesn't seem like a terrible thing to me.
> > 
> > What happens if Unified LTO does degrade performance and/or compile time for existing ThinLTO users?
> > Google is not the one asking for a major change to the ThinLTO pipelines, 
> 
> Right, but it came across to me that you were blocking it by lack of time for testing: it is fine to ask about a testing plan and some plan ahead of time on resources to commit, but it didn't seem like the dynamic at play here.
> 
> > which have been set up roughly this way since inception. While we certainly rely on ThinLTO for performance with scalability, we're also certainly not the only users of ThinLTO. IMO a major change such as this should go in under an experimental option, so that existing users are easily able to try it out, without being expected to patch in multiple patches and do that manually. It will be a lot easier to try it out if this is under an option in the upstream sources.
> 
> So basically, IIUC, we should:
> 
> 1) add an option to use ThinLTO with an new pipeline
> 2) have plan and a timeline for users to test this pipeline, and criteria of acceptation.
> 3) either graduate this pipeline to replace the existing one, or kill this option if unsuccessful. 
> 
> This seems very reasonable to me, but the stakeholder in keeping the feature working should be ready to participate in 2).
> 
> > What happens if Unified LTO does degrade performance and/or compile time for existing ThinLTO users?
> 
> Isn't the premise of the proposal that the author believe they can get the same performance as ThinLTO? Re-reading the original RFC, it does not say much about the performance claim, hence my impression that UnifiedLTO was proposed as an "orthogonal feature" to the compilation pipelines. Some clarifications may be needed on this?
> 
> Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).

Basically, the only difference between the thin LTO and the full LTO pre-link pipelines is that full LTO runs module optimization pre-link, while thin LTO does not. Running module optimization pre-link is detrimental to both performance and compile time. The full LTO pre-link pipeline will be made the same as the thin LTO pre-link pipeline in D148010, but it might take a while until we're ready to land that change.

Once that change lands this question won't matter anymore as the pipelines will be the same, but until that time it would make a lot more sense to me to use the thin LTO pre-link pipeline here, as that's the one we're ultimately going to adopt.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123803/new/

https://reviews.llvm.org/D123803