[PATCH] D95988: [CSSPGO] Process functions in a top-down order on a dynamic call graph.

Mon Feb 8 14:47:26 PST 2021

hoy added a comment.

In D95988#2549605 <https://reviews.llvm.org/D95988#2549605>, @wmi wrote:

> In D95988#2546946 <https://reviews.llvm.org/D95988#2546946>, @hoy wrote:
>
>> In D95988#2546697 <https://reviews.llvm.org/D95988#2546697>, @wmi wrote:
>>
>>> What is the main difference between dynamic call graph based inlining vs static call graph + priority based inlining (https://reviews.llvm.org/D94001)?
>>
>> The work in this change is an enhancement to the priority-based inliner in that:
>>
>> 1. Honor profile SCC traversal order for more inlining. E.g, where there is a circle in the static call graph say `A->B->C->A`, the static SCC traversal order could be any order but deterministic, let's. say `B->C->A`. If at runtime we see a context `A->B->C->A->B->C`, llvm-profgen may compress it into `A->B->C`. Therefore by walking the SCC in the `B->C->A` top-down order in the sample profile loader, we will not get `B` inlined into `A`. This change adjusts the SCC processing order to reflect what is in profile so that `A->B->C` will be walked in order.
>>
>> 2. Honor indirect call edge order. Similar given an indirect call `A->B` at runtime which is missing on the static call graph, `B` may end up being processed before `A`. We'd like `A` to be processed before `B` so that `B` gets a chance to be inlined into `A` after indirect call promotion.
>
> Thanks for the explanation. I have two questions.
>
> 1. Since static call graph edges have been removed, many cold callsites of small functions won't be inlined anymore. Those will all be left to CGSCC inlining. I understand that inlining those small functions may not help reduce caller size and boost more inlining because there is no cleanup after inlining in place in sample loader pass. But that is a big difference from current early inlining model. Do you see any impact from it?
>
> 2. The two benefits above of dynamic call graph also applies to non-CSSPGO profile. Is it possible to make it optional for non-CSSPGO profile so we can try it?

For #1, this change only adds edges to the static call graph to enforce additional order. It doesn't remove existing call edges. So it shouldn't block previous profile-based inlining. Inlining for cold callsites that exist in profile will be honored with this change. Inlining for callistes that are not recorded in the profile will mostly be done by CGSCC.

For #2, it's a good point. I believe the two benefits should also help non-CS profile based inlining. Though I haven't tried that, IIUC, the counts returned for non-inlined callees should have a better quality. That said, I'm not sure the non-CS profiled inlining can benefit as much as CSSPGO does since I haven't seen the issues that motivated this diff exists with non-CSSPGO. The main reason is that the CSSPGO inliner works on a context tri that requires explicit tri edge to present when processing a call edge. This isn't required with a non-CS profile where the concept of context tri edges are nested in the current function's profile.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95988/new/

https://reviews.llvm.org/D95988