[llvm] r372232 - [SampleFDO] Minimize performance impact when profile-sample-accurate
Wei Mi via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 24 07:37:18 PDT 2019
I committed another change trying to remove the most size increase from
this change at rL372665. Please see if it can fix the size regression.
Thanks,
Wei.
On Tue, Sep 24, 2019 at 1:16 AM Hans Wennborg <hans at chromium.org> wrote:
> On Mon, Sep 23, 2019 at 5:58 PM Wei Mi <wmi at google.com> wrote:
> >
> >
> >
> > On Mon, Sep 23, 2019 at 5:49 AM Hans Wennborg <hans at chromium.org> wrote:
> >>
> >> For us, this appears to be a size regression here with no perf gain,
> >> and the Chrome/Android binary size team tracks these things carefully.
> >> How do we motivate this change to them? A 70 KB regression might be
> >> okay if it buys us something else, but if there is no benefit, why
> >> should it be accepted?
> >>
> >>
> >> It sounds like the change is targeting internal Google benchmarks.
> >> Perhaps this should be behind a flag which they can use? Or if you
> >> want to change the default, shouldn't the case be made upstream with
> >> data?
> >>
> >
> > I think it is reasonable to put a guard and not to enable it for Os and
> Oz, but sounds like the size regression is in pieces built with O2/O3. From
> my understanding for O2 and O3, usually code size is less of a concern
> compared with performance. If code size is a big concern, maybe it should
> use Os/Oz instead of O2/O3 because I think it is common for people to trade
> off optimal performance against size in O2/O3, inlining is a typical case.
>
> We build different parts of Chrome with different flags. For this
> code, we use -O2/-O3 because we do value performance over size, but in
> this case it's not clear that there is a performance benefit.
>
Ok.
>
> > About performance data, it fixed a 1% performance regression in an
> important internal benchmark, and we thought the problem was a general one
> and could potentially exist in other places as well.
>
> Is there any data showing that this is beneficial beyond the 1% on
> this particular benchmark?
>
No, we havn't done wide evaluation for it.
>
>
>
> >> Reading the change description, it sounds like it's trying to work
> >> around bad profile data:
> >>
> >> "when ProfileSampleAccurate is true, function without sample will be
> >> regarded as cold and this could potentially cause performance
> >> regression."
> >>
> >> What if the function has no sample because it really is cold though?
> >> Inlining cold functions sounds like something we don't want. Would it
> >> be possible to improve the accuracy of your sample data instead?
> >
> >
> > That is not about the accuracy of sample data. That is caused by some
> limitation in AutoFDO itself -- use profile collected from optimized binary
> and annotate it on IR not fully optimized using debug information. There is
> a gap in between and debug information in optimized binary cannot be
> perfect. Because of it, if performance is a major concern, it is a little
> aggressive to treat everything without profile as cold. The patch corrects
> it a little bit but still keeps the most memory/code size saving from
> profile-sample-accurate.
> >
> >>
> >>
> >> On Fri, Sep 20, 2019 at 6:13 PM Wei Mi <wmi at google.com> wrote:
> >> >
> >> > The size increase is expected. We are planning to use
> profile-sample-accurate more widely in Google, but simply enabling it will
> cause performance regression so we use this patch to minimize it. The size
> increase is mostly from more inlining, but the increase is small compared
> with code size reduction from enabling profile-sample-accurate from our
> experiment.
> >> >
> >> > If code size is more of a concern for chrome on Android than
> performance, I can guard the change not to be enabled for -Os and Oz (I
> guess chrome on Android has already been built with Os or Oz).
> >> >
> >> > Please let me know what you think.
> >> >
> >> > Thanks,
> >> > Wei.
> >> >
> >> > On Fri, Sep 20, 2019 at 4:14 AM Hans Wennborg <hans at chromium.org>
> wrote:
> >> >>
> >> >> This caused a ~70 KB size regression for Chrome on Android. Is this
> expected?
> >> >>
> >> >> Also +gbiv for Chrome Android pgo.
> >> >>
> >> >> We're tracking this in
> >> >> https://bugs.chromium.org/p/chromium/issues/detail?id=1006159
> >> >>
> >> >> On Wed, Sep 18, 2019 at 6:05 PM Wei Mi via llvm-commits
> >> >> <llvm-commits at lists.llvm.org> wrote:
> >> >> >
> >> >> > Author: wmi
> >> >> > Date: Wed Sep 18 09:06:28 2019
> >> >> > New Revision: 372232
> >> >> >
> >> >> > URL: http://llvm.org/viewvc/llvm-project?rev=372232&view=rev
> >> >> > Log:
> >> >> > [SampleFDO] Minimize performance impact when
> profile-sample-accurate
> >> >> > is enabled.
> >> >> >
> >> >> > We can save memory and reduce binary size significantly by enabling
> >> >> > ProfileSampleAccurate. However when ProfileSampleAccurate is true,
> >> >> > function without sample will be regarded as cold and this could
> >> >> > potentially cause performance regression.
> >> >> >
> >> >> > To minimize the potential negative performance impact, we want to
> be
> >> >> > a little conservative here saying if a function shows up in the
> profile,
> >> >> > no matter as outline instance, inline instance or call targets,
> treat
> >> >> > the function as not being cold. This will handle the cases such as
> most
> >> >> > callsites of a function are inlined in sampled binary (thus
> outline copy
> >> >> > don't get any sample) but not inlined in current build (because of
> source
> >> >> > code drift, imprecise debug information, or the callsites are all
> cold
> >> >> > individually but not cold accumulatively...), so that the outline
> function
> >> >> > showing up as cold in sampled binary will actually not be cold
> after current
> >> >> > build. After the change, such function will be treated as not cold
> even
> >> >> > profile-sample-accurate is enabled.
> >> >> >
> >> >> > At the same time we lower the hot criteria of callsiteIsHot check
> when
> >> >> > profile-sample-accurate is enabled. callsiteIsHot is used to
> determined
> >> >> > whether a callsite is hot and qualified for early inlining. When
> >> >> > profile-sample-accurate is enabled, functions without profile will
> be
> >> >> > regarded as cold and much less inlining will happen in CGSCC
> inlining pass,
> >> >> > so we can worry less about size increase and be aggressive to
> allow more
> >> >> > early inlining to happen for warm callsites and it is helpful for
> performance
> >> >> > overall.
> >> >> >
> >> >> > Differential Revision: https://reviews.llvm.org/D67561
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190924/9e637512/attachment.html>
More information about the llvm-commits
mailing list