[llvm-dev] [RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data

Fri Aug 7 10:31:48 PDT 2020

On Fri, Aug 7, 2020 at 10:22 AM Snehasish Kumar <snehasishk at google.com>
wrote:

> Hi Wenlei,
>
> Thanks for your interest :)
>
> On Fri, Aug 7, 2020 at 12:40 AM Wenlei He <wenlei at fb.com> wrote:
>
>> Cool stuff – nice to see a late splitting pass in LLVM.
>>
>>
>>
>> > Full Propeller optimizations include function splitting and layout
>> optimizations, however it requires an additional round of profiling using
>> perf on top of the peak (FDO/CSFDO + ThinLTO) binary. In this work we
>> experiment with applying function splitting using the instrumented profile
>> in the build instead of adding an additional round of profiling.
>>
>>
>>
>> I’d expect propeller or BOLT to be more effective at doing this due to
>> better post-inline profile. Of course the usability advantage of not
>> needing a separate profile is very practical, but just wondering did you
>> see profile quality getting in the way here?
>>
> Yes, currently the pass is quite sensitive to profile quality. For e.g,
> the current default is to split only blocks with zero profile count. Using
> a binary choice is more effective than a count based threshold.
>
>>

Right -- it is a known issue with thinLTO that entry count update
(affecting block count) can be less precise due to cross module inlining --
it can either be larger or smaller than the actual count after inlining.
Using a binary choice is a way to avoid the problem.

With CSFDO profile however, you can experiment with count based threshold
as the profile data is post-inlining and precise.

David

>
>>
>> > uses existing instrumentation based FDO or CSFDO profile information.
>>
>>
>>
>> Similarly, with instrumentation FDO alone, the post-inline profile may
>> not be accurate, so for this splitting, is it more effective when used with
>> CSFDO? Was the evaluation result from FDO or CSFDO?
>>
> Yes, CSFDO profiles are more effective. The SPEC and clang bootstrap
> numbers are FDO based however our internal benchmarks are built with CSFDO
> and improvement when using CSFDO profiles > FDO profiles.
>
>>
>>
>> Also wondering does this work with Sample FDO, and do you have numbers
>> that you can share when used with Sample FDO?
>>
> We are still working on refining the pass for Sample FDO. The initial
> version up for review degrades performance when used with sample profiles.
> We have some further refinements planned which improves it (performance
> neutral) however more investigation is needed to understand the differences
> between sampled profiles and instrumented profiles late in codegen. We are
> invested in ensuring this works well for sampled profiles.
>
>>
>>
>> Thanks,
>>
>> Wenlei
>>
>
> Regards,
> Snehasish
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200807/5909793d/attachment.html>