[PATCH] D101229: [InlineCost] Bump threshold for inlining cold callsites (PR50099)

Sun Apr 25 16:24:24 PDT 2021

On Sun, Apr 25, 2021 at 3:38 AM Roman Lebedev <lebedev.ri at gmail.com> wrote:

> On Sun, Apr 25, 2021 at 7:58 AM Xinliang David Li <davidxl at google.com>
> wrote:
> >
> >
> >
> > On Sat, Apr 24, 2021 at 11:56 AM Roman Lebedev via Phabricator <
> reviews at reviews.llvm.org> wrote:
> >>
> >> lebedev.ri added a comment.
> >>
> >> In D101229#2714853 <https://reviews.llvm.org/D101229#2714853>,
> @davidxl wrote:
> >>
> >> > Changing the default threshold needs lots of benchmarking.
> >>
> >>
> >>
> >> > For this particular case, IMO the better way is to enhance inline cost
> >> > analysis to give callsite more bonus if it enables SROA in call
> context.
> >>
> >> I have thought about that too, yes.
> >>
> >> > The analysis needs to be careful such that if there is another
> callsite
> >> > that blocks SROA, and that callsites can never be inlined, then the
> bonus
> >> > can not be applied.
> >>
> >> So when inlining call to `curr_callee(arg)` from `entry()`,
> >> and we've deduced that `arg` is an alloca within `entry()`,
> >> we need to run an analysis on `entry()`, and verify that the alloca
> >> is not used by anything that would prevent SROA, that's obvious to me.
> >>
> >> The caveat that is a little murky to me still, *how* specifically
> >> should we deal with the cases when the alloca is passed as an argument
> >> to some other callee? I don't suppose we want to actually recurse into
> it?
>
> Thank you for sticking with me!
>
> > I suppose a preparation step analysis is needed:
> > For each SROA candidate (which does not have its address used
> > in an intractable way) in the caller function, compute the list
> > of call sites with its address passed in.
> >
> > After all callsites are considered for inlining, the rejected callsites
> > can be re-examined again. Say we have callsite A and B rejected,
> > but they are associated with SROA candidate X.
> So far this is reasonably straight-forward.
>
> > If A and B both can be inlined by applying the bonus,
> > then A and B will be inlined.
> And that's where things get confusing.
> We don't actually compute the full cost of inlining a certain callsite,
> we stop as soon as it's obvious that we can't inline it.
>
>
This can be changed so that the cost computation can continue just above
the threshold+bonus. Failed callsite with cost > threshold + bonus won't be
considered again. Those with cost between threshold and threshold + bonus
are candidates.

Note that we can enable the above only when the callsite takes alloca
address as argument(s) to avoid compile time waste.

So i guess what you are saying is that we need to apply the bonus,
> rerun the analysis, check that we can inline,
> check that the SROA doesn't get disabled in the callee,
> repeat that for all the rejected callees, and then inline them all at once?
>
>
See above -- there is no need to rerun the analysis.

> > Ideally, the analysis should recursively considering callsites in the
> callee,
> > but the compile time cost can be too large.
> > The downside of not considering them is that the additional inlinings
> > (due to bonus) may not necessarily result in SROA to happen in the
> caller.
> Yeah, i guess we don't want this to be recursive.
>
>
David

> > David
> Roman
>
> >>
> >>
> >> > David
> >>
> >>
> >>
> >>
> >> Repository:
> >>   rG LLVM Github Monorepo
> >>
> >> CHANGES SINCE LAST ACTION
> >>   https://reviews.llvm.org/D101229/new/
> >>
> >> https://reviews.llvm.org/D101229
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210425/6a046979/attachment.html>