[llvm-dev] PGO is ineffective for Rust - but why?

Michael Woerister via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 13 04:04:53 PDT 2019


Thank you all a lot, Teresa, David, and Philip!

This is giving me quite a todo list of things to check and try out. I'll
report back here when I have some findings.

On Thu, Sep 12, 2019 at 6:31 PM Teresa Johnson <tejohnson at google.com> wrote:

>
>
> On Thu, Sep 12, 2019 at 8:18 AM Teresa Johnson <tejohnson at google.com>
> wrote:
>
>> I just have a couple suggestions off the top of my head:
>> - have you tried using the new pass manager
>> (-fexperimental-new-pass-manager)? That has access to additional analysis
>> info during inlining and is able to make more precise PGO based inline
>> decisions.
>>
>
> (although note the above shouldn't make the difference between no
> performance and a typical PGO performance boost)
>
> Another thing I just thought of - are you using -ffunction-sections and
> -fdata-sections? These will allow for PGO based function layout in the
> linker (assuming you are using lld or gold).
>
> - have you tried collecting profile data with and without PGO to see if
>> you can compare where cycles are being spent? That's my usual way of
>> debugging performance differences related to inlining or profile changes.
>> - just a comment that it is odd you are getting better performance
>> without the pre-inlining - which typically helps because you get better
>> context-sensitive profile info. Maybe sanity check that the pre inlining is
>> kicking in for both the profile gen and use passes?
>>
>> Teresa
>>
>> On Thu, Sep 12, 2019 at 2:18 AM Michael Woerister via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi everyone,
>>>
>>> As part of my work for Mozilla's Low Level Tools team I've
>>> implemented PGO in the Rust compiler. The feature is
>>> available since Rust 1.37 [1]. However, so far we have not
>>> seen any actual performance gains from enabling PGO for
>>> Rust code. Performance even seems to drop 1-3% with PGO
>>> enabled. I wonder why that is and I'm hoping that someone
>>> here might have experience debugging PGO effectiveness.
>>>
>>>
>>> PGO in the Rust compiler
>>> ------------------------
>>>
>>> The Rust compiler uses IR-level instrumentation (the
>>> equivalent of Clang's `-fprofile-generate`/`-fprofile-use`).
>>> This has worked pretty well and even enables doing PGO for
>>> mixed Rust/C++ codebases when also using Clang.
>>>
>>> The Rust compiler has regression tests that make sure that:
>>>
>>> - instrumentation shows up in LLVM IR for the `generate` phase,
>>>   and that
>>>
>>> - profiling data is actually used during the `use` phase, i.e.
>>>   that cold functions get marked with `cold` and hot functions
>>>   get marked with `inline`.
>>>
>>> I also verified manually that `branch_weights` are being set
>>> in IR. So, from my perspective, the PGO implementation does
>>> what it is supposed to do.
>>>
>>> However, as already mentioned, in all benchmarks I've seen so
>>> far performance seems to stay the same at best and often even
>>> suffers slightly. Which is suprising because for C++ code
>>> using Clang's version of IR-level instrumentation & PGO brings
>>> signifcant gains (up to 5-10% from what I've seen in
>>> benchmarks for Firefox).
>>>
>>> One thing we noticed early on is that disabling the
>>> pre-inlining pass (`-disable-preinline`) seems to consistently
>>> improve the situation for Rust code. Doing that we sometimes
>>> see performance wins of almost 1% over not using PGO. This
>>> again is very different to C++ where disabling this pass
>>> causes dramatic performance loses for the Firefox benchmarks.
>>> And 1% performance improvement is still well below
>>> expectations, I think.
>>>
>>> So my questions to you are:
>>>
>>> - Has anybody here observed something similar while
>>>   wokring on or with PGO?
>>>
>>> - Are there certain known characteristics of LLVM IR code
>>>   that inhibit PGO's effectiveness and that IR produced by
>>>   `rustc` might exhibit?
>>>
>>> - Does anybody know of a good source that describes how to
>>>   effectively debug a problem like this?
>>>
>>> - Does anybody know of a small example program in C/C++
>>>   that is known to profit from PGO and that could be
>>>   re-implemented in Rust for comparison?
>>>
>>> Thanks a lot for reading! Any help is appreciated.
>>>
>>> -Michael
>>>
>>> [1]
>>> https://blog.rust-lang.org/2019/08/15/Rust-1.37.0.html#profile-guided-optimization
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190913/acbc6a63/attachment.html>


More information about the llvm-dev mailing list