[PATCH] D102007: [CSSPGO] Fix return value of getProbeWeight

Fri May 7 22:32:24 PDT 2021

On Fri, May 7, 2021 at 9:33 PM Wenlei He via Phabricator <
reviews at reviews.llvm.org> wrote:

> wenlei added inline comments.
>
>
> ================
> Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:566
>    if (!FS)
> -    return std::error_code();
> +    return 0;
>
> ----------------
> wmi wrote:
> > hoy wrote:
> > > wenlei wrote:
> > > > Given it more thought, perhaps whether this is 0 or error_code is
> subject to something like `profile-sample-accurate`? I.e. with that flag,
> it means cold, otherwise it means unknown (error_code)?
> > > >
> > > > The default behavior for `getInstWeight` the AutoFDO counterpart is
> returning error_code here. We don't have to do it now, but may be both
> AutoFDO and CSSPGO can check `profile-sample-accurate` here to decide
> between 0 or error_code? @wmi what do you think about AutoFDO?
> > > Good point. Was thinking about this. When it comes here, `Inst` should
> be mostly from a function that has a profile, like
> > >
> > > - If it is from a top-level inliner, the inliner should have a
> profile, otherwise it wouldn't be compiled.
> > > - If it is from an inlinee, the inlinee should have a profile,
> otherwise it wouldn't be inlined (see
> `SampleProfileLoader::getInlineCandidate`).
> > >
> > >
> > >
> > >
> > Currently profile-sample-accurate is only used on function profile,
> i.e., if a function profile is not present, and if profile-sample-accurate
> is true, compiler will set the function entry count to be zero. It is a
> balance between best performance/smaller binary size and performance
> stableness. In google currently some targets caring about size more than
> performance uses it since it will generate smaller binary.
> >
> > If we do the same for instruction level profile, it will make the target
> using the flag to be more subject to source drift issue. I am not sure
> whether it will decrease the usability of the flag -- some targets
> currently using it can get their expected balance between binary size and
> performance stableness, and the balance may be broken if we introduce more
> performance unstableness to it.
> I was hoping that by being more stringent on profile and trust zero counts
> to be zero, we would be able to save size and at the same time improve
> performance when profile is accurate.

   That is right. More cold callsites can lead to less inlining and smaller
size.

>
>
> From your experience, does profile-sample-accurate often lead to worse
> performance along with smaller size? If so, that is mostly because of
> source drift, right? If profile is refreshed very often, or used in instr.
> PGO like two step builds, it should not lead to worse performance..

profile-sample-accurate will lead to better performance and smaller size if
the profile is accurate, but more unstableness (right, from source drift).

>
>
> I think we can leave it as is, or perhaps experiment with an additional
> switch for profile-sample-block-accurate for block level counts.. in order
> to avoid affecting existing usage.

Agree, we can experiment it and see how much performance and size headroom
it provides. If the benefit is substantial, we can provide an additional
switch for the users who want the benefit and can tolerate the performance
unstableness.

>
>
>
> Repository:
>   rG LLVM Github Monorepo
>
> CHANGES SINCE LAST ACTION
>   https://reviews.llvm.org/D102007/new/
>
> https://reviews.llvm.org/D102007
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210507/73b20eb1/attachment.html>