[PATCH] D99146: [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner

Wed Mar 24 23:10:19 PDT 2021

wenlei added a comment.

In D99146#2649707 <https://reviews.llvm.org/D99146#2649707>, @wmi wrote:

> In D99146#2649599 <https://reviews.llvm.org/D99146#2649599>, @wenlei wrote:
>
>> In D99146#2649557 <https://reviews.llvm.org/D99146#2649557>, @davidxl wrote:
>>
>>> ThinLTO is known to have issues related to profile update (cross module), so we were thinking something similar in ThinLink phase.
>>
>> This is the exact problem we are trying to mitigate. We also considered doing this in ThinLink but adjusting profiles for thin-backends and communicating inline decisions to thin-backends would add quite a bit of complexity, which could also slow down ThinLink. With CSSPGO, doing it in profile generation and use adjusted profile to convey inline estimation/suggestion is much simpler and cheaper.
>>
>>> One of the issues is that the pre-inlining needs to make similar decisions as the compiler. How well is the preinliner doing in this regard?
>>
>> Yes, this is a challenge. We don't have data yet, but I hope with some tuning we can get them to be close. One problem with doing pre-inlining is we don't have a lot of information that compiler can see from IR, though if needed some of that can be embedded into binary (some metadata in probe descriptor, etc.) for preinliner. I hope a more accurate view on machine code byte size for inline cost can offset some of the disadvantages due to lack of IR.
>
> It is a good idea to have an non-intrusive way to predict cross-module inlining decision and update the profile beforehand.
>
> To mitigate ThinLTO profile update issue, either apparent inline or no-inline decisions can be made. From the patch description, seems currently it only considers the case that no-inline decision is made and profile can be merged back. Have you considered the case that inline is apparently beneficial and profile without context can be split?

If we don't have context profile from raw input profile, the split is going to be a simple scaling based on call site counts, right? In that case, doing it in profile generation won't improve profile quality because the scaling won't be very different from the scaling done by cgscc inliner. Though if we split the profiles to synthesize context profile, sample loader would be able to inline more, but if we want we could allow sample loader inlining to do scaling.

>> We'll be working on tuning the preinliner to get it to be close to compiler inliner. This is similar to the effort of transferring more inlining from cgscc inliner to sample loader inliner in that we may not see immediate results, but over time, as the new component matures, we hope to reap benefits later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99146/new/

https://reviews.llvm.org/D99146