[PATCH] D101229: [InlineCost] Bump threshold for inlining cold callsites (PR50099)

Sun Apr 25 03:38:04 PDT 2021

On Sun, Apr 25, 2021 at 7:58 AM Xinliang David Li <davidxl at google.com> wrote:
>
>
>
> On Sat, Apr 24, 2021 at 11:56 AM Roman Lebedev via Phabricator <reviews at reviews.llvm.org> wrote:
>>
>> lebedev.ri added a comment.
>>
>> In D101229#2714853 <https://reviews.llvm.org/D101229#2714853>, @davidxl wrote:
>>
>> > Changing the default threshold needs lots of benchmarking.
>>
>>
>>
>> > For this particular case, IMO the better way is to enhance inline cost
>> > analysis to give callsite more bonus if it enables SROA in call context.
>>
>> I have thought about that too, yes.
>>
>> > The analysis needs to be careful such that if there is another callsite
>> > that blocks SROA, and that callsites can never be inlined, then the bonus
>> > can not be applied.
>>
>> So when inlining call to `curr_callee(arg)` from `entry()`,
>> and we've deduced that `arg` is an alloca within `entry()`,
>> we need to run an analysis on `entry()`, and verify that the alloca
>> is not used by anything that would prevent SROA, that's obvious to me.
>>
>> The caveat that is a little murky to me still, *how* specifically
>> should we deal with the cases when the alloca is passed as an argument
>> to some other callee? I don't suppose we want to actually recurse into it?

Thank you for sticking with me!

> I suppose a preparation step analysis is needed:
> For each SROA candidate (which does not have its address used
> in an intractable way) in the caller function, compute the list
> of call sites with its address passed in.
>
> After all callsites are considered for inlining, the rejected callsites
> can be re-examined again. Say we have callsite A and B rejected,
> but they are associated with SROA candidate X.
So far this is reasonably straight-forward.

> If A and B both can be inlined by applying the bonus,
> then A and B will be inlined.
And that's where things get confusing.
We don't actually compute the full cost of inlining a certain callsite,
we stop as soon as it's obvious that we can't inline it.

So i guess what you are saying is that we need to apply the bonus,
rerun the analysis, check that we can inline,
check that the SROA doesn't get disabled in the callee,
repeat that for all the rejected callees, and then inline them all at once?

> Ideally, the analysis should recursively considering callsites in the callee,
> but the compile time cost can be too large.
> The downside of not considering them is that the additional inlinings
> (due to bonus) may not necessarily result in SROA to happen in the caller.
Yeah, i guess we don't want this to be recursive.

> David
Roman

>>
>>
>> > David
>>
>>
>>
>>
>> Repository:
>>   rG LLVM Github Monorepo
>>
>> CHANGES SINCE LAST ACTION
>>   https://reviews.llvm.org/D101229/new/
>>
>> https://reviews.llvm.org/D101229
>>