[llvm] r372232 - [SampleFDO] Minimize performance impact when profile-sample-accurate

George Burgess via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 20 11:10:54 PDT 2019


Android's Chrome is built with -Oz by default, but some pieces are built
with -O2/-O3.

IIRC, most of the size win for Android was through the use of
-fprofile-sample-accurate on -O2/-O3'd TUs. Do you expect that, purely from
a code-size perspective, prod will want to further reduce the effectiveness
of -fprofile-sample-accurate, or is this the only regression on the roadmap
for the moment?

I can personally live with a one-off 70KB regression, but if we plan to
have more, Chrome for Android might want a -fprofile-sample-very-accurate
flag :)

On Fri, Sep 20, 2019 at 9:13 AM Wei Mi <wmi at google.com> wrote:

> The size increase is expected. We are planning to use
> profile-sample-accurate more widely in Google, but simply enabling it will
> cause performance regression so we use this patch to minimize it. The size
> increase is mostly from more inlining, but the increase is small compared
> with code size reduction from enabling profile-sample-accurate from our
> experiment.
>
> If code size is more of a concern for chrome on Android than performance,
> I can guard the change not to be enabled for -Os and Oz (I guess chrome on
> Android has already been built with Os or Oz).
>
> Please let me know what you think.
>
> Thanks,
> Wei.
>
> On Fri, Sep 20, 2019 at 4:14 AM Hans Wennborg <hans at chromium.org> wrote:
>
>> This caused a ~70 KB size regression for Chrome on Android. Is this
>> expected?
>>
>> Also +gbiv for Chrome Android pgo.
>>
>> We're tracking this in
>> https://bugs.chromium.org/p/chromium/issues/detail?id=1006159
>>
>> On Wed, Sep 18, 2019 at 6:05 PM Wei Mi via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>> >
>> > Author: wmi
>> > Date: Wed Sep 18 09:06:28 2019
>> > New Revision: 372232
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=372232&view=rev
>> > Log:
>> > [SampleFDO] Minimize performance impact when profile-sample-accurate
>> > is enabled.
>> >
>> > We can save memory and reduce binary size significantly by enabling
>> > ProfileSampleAccurate. However when ProfileSampleAccurate is true,
>> > function without sample will be regarded as cold and this could
>> > potentially cause performance regression.
>> >
>> > To minimize the potential negative performance impact, we want to be
>> > a little conservative here saying if a function shows up in the profile,
>> > no matter as outline instance, inline instance or call targets, treat
>> > the function as not being cold. This will handle the cases such as most
>> > callsites of a function are inlined in sampled binary (thus outline copy
>> > don't get any sample) but not inlined in current build (because of
>> source
>> > code drift, imprecise debug information, or the callsites are all cold
>> > individually but not cold accumulatively...), so that the outline
>> function
>> > showing up as cold in sampled binary will actually not be cold after
>> current
>> > build. After the change, such function will be treated as not cold even
>> > profile-sample-accurate is enabled.
>> >
>> > At the same time we lower the hot criteria of callsiteIsHot check when
>> > profile-sample-accurate is enabled. callsiteIsHot is used to determined
>> > whether a callsite is hot and qualified for early inlining. When
>> > profile-sample-accurate is enabled, functions without profile will be
>> > regarded as cold and much less inlining will happen in CGSCC inlining
>> pass,
>> > so we can worry less about size increase and be aggressive to allow more
>> > early inlining to happen for warm callsites and it is helpful for
>> performance
>> > overall.
>> >
>> > Differential Revision: https://reviews.llvm.org/D67561
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190920/fb154c76/attachment.html>


More information about the llvm-commits mailing list