[PATCH] D123803: [WIP][llvm] A Unified LTO Bitcode Frontend

Mon May 22 15:14:52 PDT 2023

ormris added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1144
+      if (!PTO.UnifiedLTO)
+        MPM.addPass(buildThinLTODefaultPipeline(L, nullptr));
+      else
----------------
tejohnson wrote:
> nikic wrote:
> > mehdi_amini wrote:
> > > tejohnson wrote:
> > > > nikic wrote:
> > > > > mehdi_amini wrote:
> > > > > > tejohnson wrote:
> > > > > > > mehdi_amini wrote:
> > > > > > > > nikic wrote:
> > > > > > > > > tejohnson wrote:
> > > > > > > > > > tejohnson wrote:
> > > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > > tejohnson wrote:
> > > > > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > > > > It is concerning to me that we add one mode different code path / behavior to maintain instead of unifying everything.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > > > > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Perhaps it can eventually, but I would not want to make a major change to the ThinLTO pipelines without a lot of experimentation. I don't personally have the bandwidth to do that right now, but if this was in as an alternative mode under an option, it could be done more easily at some point on a wider range of applications. I'd be concerned for example of side effects on importing behavior which is based on instruction count thresholds.
> > > > > > > > > > > > Right, but your objection is exactly the root of my concerned with this new mode in the first place right now.
> > > > > > > > > > > Yeah, it isn't ideal to have added complexity, but I do understand the different constraints. The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > > > > > > > > > 
> > > > > > > > > > > I wonder if we can get partly to a more common approach but just have a flag to switch between the different pass managers in the pre and post LTO optimization pipelines. I haven't had a chance to look closely at the patches yet, but my sense is that the other major change is enabling "split" LTO bitcode files always, for which I don't yet have a good understanding of the implications. I'll try to spend some time looking at the patches in more detail in the next few days.
> > > > > > > > > > Per discussion on the RFC, the unified LTO mode added here requires split thin/regular LTO units. This is not something we have been able to use internally because of the scalability of the regular LTO portions. So we will need to keep the usual "pure" ThinLTO mode operational.
> > > > > > > > > I feel like I'm missing something here. Why do we need to force the use of the (known-broken, lower quality) full LTO pre-link pipeline here, rather than sticking to the thin LTO pre-link pipeline?
> > > > > > > > Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> > > > > > > > 
> > > > > > > > >  The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > > > > > > 
> > > > > > > > I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option `-flto=google-pipeline` (or `-Ogoogle` instead of `-O2`). You're getting into "this belongs to your downstream fork" territory IMO.
> > > > > > > > The point of having a limited set of configuration in-tree is that every user contribute also to the testing of these pipelines. Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> > > > > > > 
> > > > > > > The reverse is also a question - if we are to adopt the full LTO pipeline here, what does it mean for ThinLTO performance (and compile time, given what appears to be a requirement that split modules be used which means that ThinLTO now would be required to include some amount of full LTO)? The current ThinLTO pipeline attempts to maximize performance since we don't have to worry about the full LTO scalability issues.
> > > > > > > 
> > > > > > > > I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option -flto=google-pipeline (or -Ogoogle instead of -O2). You're getting into "this belongs to your downstream fork" territory IMO.
> > > > > > > 
> > > > > > > Google is not the one asking for a major change to the ThinLTO pipelines, which have been set up roughly this way since inception. While we certainly rely on ThinLTO for performance with scalability, we're also certainly not the only users of ThinLTO. IMO a major change such as this should go in under an experimental option, so that existing users are easily able to try it out, without being expected to patch in multiple patches and do that manually. It will be a lot easier to try it out if this is under an option in the upstream sources.
> > > > > > > 
> > > > > > > > Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.
> > > > > > > 
> > > > > > > Since Unified LTO is an intermediate between Thin and Full LTO, which have their own pipelines already to balance their different needs, having a different pipeline for a different LTO mode with different needs doesn't seem like a terrible thing to me.
> > > > > > > 
> > > > > > > What happens if Unified LTO does degrade performance and/or compile time for existing ThinLTO users?
> > > > > > > Google is not the one asking for a major change to the ThinLTO pipelines, 
> > > > > > 
> > > > > > Right, but it came across to me that you were blocking it by lack of time for testing: it is fine to ask about a testing plan and some plan ahead of time on resources to commit, but it didn't seem like the dynamic at play here.
> > > > > > 
> > > > > > > which have been set up roughly this way since inception. While we certainly rely on ThinLTO for performance with scalability, we're also certainly not the only users of ThinLTO. IMO a major change such as this should go in under an experimental option, so that existing users are easily able to try it out, without being expected to patch in multiple patches and do that manually. It will be a lot easier to try it out if this is under an option in the upstream sources.
> > > > > > 
> > > > > > So basically, IIUC, we should:
> > > > > > 
> > > > > > 1) add an option to use ThinLTO with an new pipeline
> > > > > > 2) have plan and a timeline for users to test this pipeline, and criteria of acceptation.
> > > > > > 3) either graduate this pipeline to replace the existing one, or kill this option if unsuccessful. 
> > > > > > 
> > > > > > This seems very reasonable to me, but the stakeholder in keeping the feature working should be ready to participate in 2).
> > > > > > 
> > > > > > > What happens if Unified LTO does degrade performance and/or compile time for existing ThinLTO users?
> > > > > > 
> > > > > > Isn't the premise of the proposal that the author believe they can get the same performance as ThinLTO? Re-reading the original RFC, it does not say much about the performance claim, hence my impression that UnifiedLTO was proposed as an "orthogonal feature" to the compilation pipelines. Some clarifications may be needed on this?
> > > > > > 
> > > > > > Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> > > > > 
> > > > > Basically, the only difference between the thin LTO and the full LTO pre-link pipelines is that full LTO runs module optimization pre-link, while thin LTO does not. Running module optimization pre-link is detrimental to both performance and compile time. The full LTO pre-link pipeline will be made the same as the thin LTO pre-link pipeline in D148010, but it might take a while until we're ready to land that change.
> > > > > 
> > > > > Once that change lands this question won't matter anymore as the pipelines will be the same, but until that time it would make a lot more sense to me to use the thin LTO pre-link pipeline here, as that's the one we're ultimately going to adopt.
> > > > > Basically, the only difference between the thin LTO and the full LTO pre-link pipelines is that full LTO runs module optimization pre-link, while thin LTO does not. Running module optimization pre-link is detrimental to both performance and compile time. The full LTO pre-link pipeline will be made the same as the thin LTO pre-link pipeline in D148010, but it might take a while until we're ready to land that change.
> > > > 
> > > > Also, the odd thing here (see my comment a couple lines below), is that this case is where the post-link pipeline has been requested, where we normally run the "ThinLTODefault" pipeline (not the pre-link). With UnifiedLTO the code is instead running the full LTO pre-link, however. But this code is just used for pipeline testing via opt I believe. The pipeline setup code in the companion clang patch seems to be doing the intended thing (using full LTO pre-link instead of thin LTO pre-link under the unified LTO option). However, as you note, it isn't clear whether that is what we want.
> > > > 
> > > > @ormris is this a bug here?
> > > > Running module optimization pre-link is detrimental to both performance and compile time
> > > 
> > > I have a different experience: I tried to align FullLTO on the ThinLTO pipeline while we were building ThinLTO (circa 2016), and `ninja clang` (with FullLTO enabled) would take basically twice more time. This is because you're basically shifting compile time from the parallel compile phase to the sequential link-time phase.
> > > 
> > > I ended up proposing a patch here: https://reviews.llvm.org/D29376 which was tested on the performance aspect on games and embedded system (see the comment thread), without a good conclusion.
> > > The compile-time impact was deemed too high for it to be worthwhile to pursue at the time.
> > > 
> > From a quick look, what you were trying to do is align both the pre-link and the post-link full LTO pipelines. I'm talking only about the pre-link pipeline here. Making the post-link full LTO pipeline the same as the thin LTO pipeline would indeed likely run into compile-time issues.
> > Right, but it came across to me that you were blocking it by lack of time for testing: it is fine to ask about a testing plan and some plan ahead of time on resources to commit, but it didn't seem like the dynamic at play here.
> 
> Not trying to block, I was just trying to agree with the approach here in putting it upstream under an option. If it is in under an option, it is a lot easier for a wider range of people to try it out in parallel. For example, it will be a lot easier to send it through our various pre-release compile-time and performance testing suites with potentially multiple people looking at it.
> 
> > So basically, IIUC, we should:
> > 1. add an option to use ThinLTO with an new pipeline
> > 2. have plan and a timeline for users to test this pipeline, and criteria of acceptation.
> > 3. either graduate this pipeline to replace the existing one, or kill this option if unsuccessful.
> 
> Agree, although regarding 3 my understanding is that they are trying to solve a specific problem, by allowing the decision about thin vs full LTO to be delayed until the LTO link time and simplifying deploying bitcode libraries. So the criteria for success for Sony and anyone else who wants these benefits is likely going to be different than from ThinLTO users who don't care about this and just want the best performance/build time tradeoff.
> They are trying to solve a specific problem, by allowing the decision about thin vs full LTO to be delayed until the LTO link time and simplifying deploying bitcode libraries.

Correct. There are specific aspects of the LTO UX that we wanted to change, as noted in the RFC.

>  it is a lot easier for a wider range of people to try it out in parallel

A few others have expressed interest on discourse in using this pipeline for other projects. I think it's likely we would see other projects testing this pipeline if it was committed. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123803/new/

https://reviews.llvm.org/D123803