[PATCH] D146776: [llvm] Preliminary fat-lto-objects support

Thu May 25 14:18:22 PDT 2023

aeubanks added a comment.

In D146776#4302238 <https://reviews.llvm.org/D146776#4302238>, @tejohnson wrote:

> I agree ensuring full performance with FatLTO's non-LTO native object isn't terribly important. But my concern is whether this results in some unexpected behavior with certain passes not being run at all - e.g. ICP won't run at all if you are doing SamplePGO in this configuration.
>
> ...
>
> As mentioned earlier, I think the question is whether it will be a surprise when using SamplePGO that certain things simply don't get run when using the non-LTO native objects. I don't know enough about when these will get used to know if this is a configuration anyone will care about. If it is, I suppose one approach would be to invoke the longer pipelines only under SamplePGO. Either way, this probably needs some clear documentation.

These are all performance concerns for the non-LTO native objects, not correctness right? I think if we clearly document that performance might be very meh for fat LTO non-LTO native objects this would be fine.

================
Comment at: llvm/include/llvm/Bitcode/EmbedBitcodePass.h:32
+  EmbedBitcodePass(bool IsThinLTO, bool EmitLTOSummary,
+                   ModulePassManager &&MPM = ModulePassManager())
+      : IsThinLTO(IsThinLTO), EmitLTOSummary(EmitLTOSummary),
----------------
is this default param necessary?

================
Comment at: llvm/lib/Passes/PassBuilderPipelines.cpp:1266
+ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
+    OptimizationLevel Level, ThinOrFullLTOPhase LTOPhase, bool IsFatLTO) {
   const bool LTOPreLink = isLTOPreLink(LTOPhase);
----------------
do we still need `IsFatLTO` with the latest approach?

================
Comment at: llvm/lib/Passes/PassBuilderPipelines.cpp:1489
+      buildModuleSimplificationPipeline(Level, ThinOrFullLTOPhase::None));
+  MPM.addPass(buildModuleOptimizationPipeline(Level, ThinOrFullLTOPhase::None,
+                                              /*IsFatLto*/ true));
----------------
paulkirth wrote:
> tejohnson wrote:
> > paulkirth wrote:
> > > tejohnson wrote:
> > > > Why pass in phase None for these calls to module simplification and optimization passes, instead of the corresponding post-LTO link phase? I would think we would want the complement of the phase provided earlier. I'm pretty sure there are some passes that are configured to run in exactly one of these phases, and we might end up duplicating them if running a pre-LTO link phase + a default (non-LTO) phase optimization pipeline.
> > > The intent there was to closely mimic the non-lto pipeline. For now I've attempted to just split everything, but I'm not sure I love the approach. It's certainly simple, but I didn't like giving a pass its own passmanager... plus its does seem to be a bit wasteful to run two complete pipelines on the module.
> > I don't think the prior code mimic'ed the non-LTO pipeline, however. My understanding is that the prior version did the following for the non embedded code on which it would generate a native object:
> > 
> > (Thin)LTOPreLink + ModuleSimplification(ThinOrFullLTOPhase::None) + ModuleOptimization(ThinOrFullLTOPhase::None)
> > 
> > So you would have been essentially duplicating ModuleSimplification (since it is invoked in both the *LTO pre-link and again with LTO Phase = None). That can cause some issues because there are passes that are only meant to be invoked once that you could duplicate. A good example is PGO instrumentation/annotation (see the conditions under which addPGOInstrPasses is called from buildModuleSimplificationPipeline).
> > 
> > With your new version, my understanding is that you are calling:
> > 1. buildPerModuleDefaultPipeline on original Module -> generate native object
> > 2. (Thin)LTOPreLink on Module clone -> embed IR
> > 
> > And buildPerModuleDefaultPipeline is largely just ModuleSimplification + ModuleOptimization. So I'm not sure about why it would be more wasteful (other than cloning the Module) - you are not running more pipelines total than in the prior code. Unless I am misunderstanding the former or current code, which is possible!
> > I don't think the prior code mimic'ed the non-LTO pipeline, however. My understanding is that the prior version did the following for the non embedded code on which it would generate a native object:
> > 
> > (Thin)LTOPreLink + ModuleSimplification(ThinOrFullLTOPhase::None) + ModuleOptimization(ThinOrFullLTOPhase::None)
> > 
> > So you would have been essentially duplicating ModuleSimplification (since it is invoked in both the *LTO pre-link and again with LTO Phase = None). That can cause some issues because there are passes that are only meant to be invoked once that you could duplicate. A good example is PGO instrumentation/annotation (see the conditions under which addPGOInstrPasses is called from buildModuleSimplificationPipeline).
> > 
> > With your new version, my understanding is that you are calling:
> > 1. buildPerModuleDefaultPipeline on original Module -> generate native object
> > 2. (Thin)LTOPreLink on Module clone -> embed IR
> > 
> > And buildPerModuleDefaultPipeline is largely just ModuleSimplification + ModuleOptimization. So I'm not sure about why it would be more wasteful (other than cloning the Module) - you are not running more pipelines total than in the prior code. Unless I am misunderstanding the former or current code, which is possible!
> 
> I think that's an accurate summary. While my intent originally was to closely mimic the default pipeline, I don't think it was very close in reality. Also, as you pointed out the new approach isn't nearly as wasteful as I felt initially.
IIUC the previous version was `ThinLTOPreLink + ModuleOptimization(ThinOrFullLTOPhase::None)`, where we'd write out the module after pre-link and before module optimization to some global. So we'd avoid running the simplification pipeline twice with that. (the problem is that the simplification pipeline is slightly different between a normal build and a ThinLTO build)

the current version will run the simplification pipeline twice now, once for ThinLTO and once for the default pipeline

================
Comment at: llvm/lib/Passes/PassRegistry.def:61
 MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
+MODULE_PASS("embed-bitcode", EmbedBitcodePass(true, true))
 MODULE_PASS("forceattrs", ForceFunctionAttrsPass())
----------------
these should be parameterizable, see `MODULE_PASS_WITH_PARAMS`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146776/new/

https://reviews.llvm.org/D146776