[llvm] [LTO][Pipelines] Add 0 hot-caller threshold for SamplePGO + FullLTO (PR #135152)

Tue Apr 15 06:55:45 PDT 2025

teresajohnson wrote:

> Hi Mingming, very appreciate for your detailed answer! As you are mentioning button-up call graph based inline, I have a question(not closely about this patch) about inlining in SampleProfileLoaderPass. In the bottom-up call graph based inline, each inlined function firstly runs a pipeline of optimization before it inlined. For example, in buildModuleSimplificationPipeline, callee function would first runs pre-inline passes, which contains a series of optimizations, then is inlined. And even after inlining, the caller function does buildFunctionSimplificationPipeline, which also contains a series of optimizations, before it is inlined by other functions.

This is in part necessary because the bottom up inliner uses a cost benefit analysis to decide whether to inline, and that is aided by optimizing each function before it is considered for inlining into its callers.

> But inline in SampleProfileLoaderPass, inlining (SampleProfileLoader::inlineHotFunctions) is call graph top-down order without doing any optimization for callee functions.

The inlining done in the sample loader pass is focused on getting the best matching of the profile with the context from the profiled binary. This is easier on unoptimized functions.

> Considering many optimizations are top-down scanning in BB/function for finding chance, So I have a concern that some hot short function's optimization chance might be broken after it is inlined in a bigger context by SampleProfileLoaderPass.

Why wouldn't the inlined version of the hot function be optimized when we go through the normal optimization pipeline?

> If my above point is right, I have an immature thought that we can just only mark/record the inlinings context in SampleProfileLoaderPass but defer the implementation of inline in general inline pipelines/passes. So that all functions could obey bottom-up order and do full optimizations before they are inlined. Do you have idea about my thought?

In addition to what I mentioned above, a number of optimization passes are profile guided. I'm not sure how you would get the correct (context sensitive) profile data if that inlining was deferred.

https://github.com/llvm/llvm-project/pull/135152