[PATCH] D121862: [ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions.

Mon Mar 21 16:14:39 PDT 2022

luna added a comment.

In D121862#3394290 <https://reviews.llvm.org/D121862#3394290>, @wenlei wrote:

>> For my understanding, how does over-optimization or measurable code size change happen when profiles are merged and pre-link inliner skips IR transformation? Does it happen for the following scenario?
>>
>> - "merging without inlining" cause some functions (on the edge) to become hot functions; these hot functions are imported in post-link sample loader, and cause code size increase
>
> If we skip inlining at a particular call site during pre-link, but still inline that site during post-link, merging causes pre-link passes to be unnecessarily aggressive (including pre-link SCC inliner for the callee) based on merged (larger) counts because the merged counts don't reflect post-link inlining.

Since `--disable-sample-loader-inlining` applies on both pre-link and post-link sample loader inlining in one clang invocation, I'm assuming the (//undesired aggressive//) inlining happens in feedback-driven inliner  (i.e. `InlinerPass::run` pointed by [1])

Is my understanding correct that the current implementation [2] of `--disable-sample-loader-inlining` would (//undesirably//) inline a function in post-linking stage due to feedback-driven inliner, if this function satisfy following conditions

1. The function is hot within the module,
  - So inline is desired in pre-linking stage.
2. The function is cold across module
  - So inline is not desired in post-linking stage, but got inlined (and could be counter-productive) by feedback-driven inliner.

If the scenario above is where problem happens, I'm planning to

1. Call out this undesired side-effect for the usage of `--disable-sample-loader-inlining`
2. Call out the option is used to evaluate AutoFDO inliner, and recommend using it together with hacks [3] in feedback-driven inliner for evaluation purpose.
3. Proceed with this change, as the current implementation could be confusing (in-accurate profiles that affect more than sample-loader inlining)

May I get some feedback on the plans above? thanks!

[1] https://github.com/llvm/llvm-project/blob/f4b794427e8037a4e952cacdfe7201e961f31a6f/llvm/lib/Transforms/IPO/Inliner.cpp#L745
[2] the current impl is

1. skipping sample-loader inlining transformations artificially
2. merging profiles of inline instance to its outlining version

[3] The hack is to still use static information in calculating "threshold" but ignores profiles (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InlineCost.cpp#L1869-L1902 is where profiles are used to calculate threshold)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121862/new/

https://reviews.llvm.org/D121862