[PATCH] D123803: [WIP][llvm] A Unified LTO Bitcode Frontend

Fri May 19 07:27:14 PDT 2023

tejohnson added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1144
+      if (!PTO.UnifiedLTO)
+        MPM.addPass(buildThinLTODefaultPipeline(L, nullptr));
+      else
----------------
mehdi_amini wrote:
> nikic wrote:
> > tejohnson wrote:
> > > tejohnson wrote:
> > > > mehdi_amini wrote:
> > > > > tejohnson wrote:
> > > > > > mehdi_amini wrote:
> > > > > > > It is concerning to me that we add one mode different code path / behavior to maintain instead of unifying everything.
> > > > > > > 
> > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > > If UnifiedLTO is able to use the LTO pipeline effectively, what would be the reason for ThinLTO to not align?
> > > > > > 
> > > > > > Perhaps it can eventually, but I would not want to make a major change to the ThinLTO pipelines without a lot of experimentation. I don't personally have the bandwidth to do that right now, but if this was in as an alternative mode under an option, it could be done more easily at some point on a wider range of applications. I'd be concerned for example of side effects on importing behavior which is based on instruction count thresholds.
> > > > > Right, but your objection is exactly the root of my concerned with this new mode in the first place right now.
> > > > Yeah, it isn't ideal to have added complexity, but I do understand the different constraints. The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> > > > 
> > > > I wonder if we can get partly to a more common approach but just have a flag to switch between the different pass managers in the pre and post LTO optimization pipelines. I haven't had a chance to look closely at the patches yet, but my sense is that the other major change is enabling "split" LTO bitcode files always, for which I don't yet have a good understanding of the implications. I'll try to spend some time looking at the patches in more detail in the next few days.
> > > Per discussion on the RFC, the unified LTO mode added here requires split thin/regular LTO units. This is not something we have been able to use internally because of the scalability of the regular LTO portions. So we will need to keep the usual "pure" ThinLTO mode operational.
> > I feel like I'm missing something here. Why do we need to force the use of the (known-broken, lower quality) full LTO pre-link pipeline here, rather than sticking to the thin LTO pre-link pipeline?
> Can you elaborate on what is known-broken with the full LTO pre-link pipeline? And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).
> 
> >  The new mode seems to work well enough for Sony's needs, but for users such as mine at Google that want to maximize performance from ThinLTO, it may not be the best approach (or may be ok, but needs to be carefully evaluated). Unfortunately, I don't have a good immediate solution to balancing those two sets of needs at the moment, other than supporting different modes.
> 
> I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option `-flto=google-pipeline` (or `-Ogoogle` instead of `-O2`). You're getting into "this belongs to your downstream fork" territory IMO.
> The point of having a limited set of configuration in-tree is that every user contribute also to the testing of these pipelines. Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.
> 
> 
> 
> 
> And if we were to adopt the ThinLTO pipeline here for FullLTO, what does it mean for the FullLTO pipeline at link time? The two goes hand-in-hand somehow and the current situation (as far as I remember) balances compile time between the two phases (which is much more sensitive for FullLTO since the link phase is sequential).

The reverse is also a question - if we are to adopt the full LTO pipeline here, what does it mean for ThinLTO performance (and compile time, given what appears to be a requirement that split modules be used which means that ThinLTO now would be required to include some amount of full LTO)? The current ThinLTO pipeline attempts to maximize performance since we don't have to worry about the full LTO scalability issues.

> I am still concerned with divergence that wouldn't be just temporary: what would be the timeline to reconcile the paths? I understand you may not have time just now, but I don't think it is reasonable to just keep code in-tree forever "because Google can't evaluate changes to the pipeline", it is akin to have a dedicated pipeline in-tree and a clang option -flto=google-pipeline (or -Ogoogle instead of -O2). You're getting into "this belongs to your downstream fork" territory IMO.

Google is not the one asking for a major change to the ThinLTO pipelines, which have been set up roughly this way since inception. While we certainly rely on ThinLTO for performance with scalability, we're also certainly not the only users of ThinLTO. IMO a major change such as this should go in under an experimental option, so that existing users are easily able to try it out, without being expected to patch in multiple patches and do that manually. It will be a lot easier to try it out if this is under an option in the upstream sources.

> Having a feature for "unified LTO" that isn't orthogonal to the optimization pipelines doesn't seem right to me in term of product.

Since Unified LTO is an intermediate between Thin and Full LTO, which have their own pipelines already to balance their different needs, having a different pipeline for a different LTO mode with different needs doesn't seem like a terrible thing to me.

What happens if Unified LTO does degrade performance and/or compile time for existing ThinLTO users?

================
Comment at: llvm/lib/Passes/PassBuilder.cpp:1146
+      else
+        MPM.addPass(buildLTOPreLinkDefaultPipeline(L));
     } else if (Matches[1] == "lto-pre-link") {
----------------
It is a bit odd to see that under unified LTO the regular LTO "pre-link" pipeline is used during the post link phase. I don't remember the reasons for this, maybe it is in the RFC, but it at least needs a clear comment.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123803/new/

https://reviews.llvm.org/D123803