[llvm] r372232 - [SampleFDO] Minimize performance impact when profile-sample-accurate

Hans Wennborg via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 23 05:49:09 PDT 2019


For us, this appears to be a size regression here with no perf gain,
and the Chrome/Android binary size team tracks these things carefully.
How do we motivate this change to them? A 70 KB regression might be
okay if it buys us something else, but if there is no benefit, why
should it be accepted?

It sounds like the change is targeting internal Google benchmarks.
Perhaps this should be behind a flag which they can use? Or if you
want to change the default, shouldn't the case be made upstream with
data?

Reading the change description, it sounds like it's trying to work
around bad profile data:

"when ProfileSampleAccurate is true, function without sample will be
regarded as cold and this could potentially cause performance
regression."

What if the function has no sample because it really is cold though?
Inlining cold functions sounds like something we don't want. Would it
be possible to improve the accuracy of your sample data instead?

On Fri, Sep 20, 2019 at 6:13 PM Wei Mi <wmi at google.com> wrote:
>
> The size increase is expected. We are planning to use profile-sample-accurate more widely in Google, but simply enabling it will cause performance regression so we use this patch to minimize it. The size increase is mostly from more inlining, but the increase is small compared with code size reduction from enabling profile-sample-accurate from our experiment.
>
> If code size is more of a concern for chrome on Android than performance, I can guard the change not to be enabled for -Os and Oz (I guess chrome on Android has already been built with Os or Oz).
>
> Please let me know what you think.
>
> Thanks,
> Wei.
>
> On Fri, Sep 20, 2019 at 4:14 AM Hans Wennborg <hans at chromium.org> wrote:
>>
>> This caused a ~70 KB size regression for Chrome on Android. Is this expected?
>>
>> Also +gbiv for Chrome Android pgo.
>>
>> We're tracking this in
>> https://bugs.chromium.org/p/chromium/issues/detail?id=1006159
>>
>> On Wed, Sep 18, 2019 at 6:05 PM Wei Mi via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>> >
>> > Author: wmi
>> > Date: Wed Sep 18 09:06:28 2019
>> > New Revision: 372232
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=372232&view=rev
>> > Log:
>> > [SampleFDO] Minimize performance impact when profile-sample-accurate
>> > is enabled.
>> >
>> > We can save memory and reduce binary size significantly by enabling
>> > ProfileSampleAccurate. However when ProfileSampleAccurate is true,
>> > function without sample will be regarded as cold and this could
>> > potentially cause performance regression.
>> >
>> > To minimize the potential negative performance impact, we want to be
>> > a little conservative here saying if a function shows up in the profile,
>> > no matter as outline instance, inline instance or call targets, treat
>> > the function as not being cold. This will handle the cases such as most
>> > callsites of a function are inlined in sampled binary (thus outline copy
>> > don't get any sample) but not inlined in current build (because of source
>> > code drift, imprecise debug information, or the callsites are all cold
>> > individually but not cold accumulatively...), so that the outline function
>> > showing up as cold in sampled binary will actually not be cold after current
>> > build. After the change, such function will be treated as not cold even
>> > profile-sample-accurate is enabled.
>> >
>> > At the same time we lower the hot criteria of callsiteIsHot check when
>> > profile-sample-accurate is enabled. callsiteIsHot is used to determined
>> > whether a callsite is hot and qualified for early inlining. When
>> > profile-sample-accurate is enabled, functions without profile will be
>> > regarded as cold and much less inlining will happen in CGSCC inlining pass,
>> > so we can worry less about size increase and be aggressive to allow more
>> > early inlining to happen for warm callsites and it is helpful for performance
>> > overall.
>> >
>> > Differential Revision: https://reviews.llvm.org/D67561


More information about the llvm-commits mailing list