[llvm-dev] PGO is ineffective for Rust - but why?

Mon Sep 16 10:07:20 PDT 2019

Interesting. By ld do you mean GNU ld? I know GNU ld does "work" with
LLVM's gold plugin, but it's an untested combination and not recommended. I
wouldn't be surprised if there were some issues around it not passing
necessary info to the gold plugin.

Teresa

On Mon, Sep 16, 2019 at 8:41 AM Michael Woerister <mwoerister at mozilla.com>
wrote:

> So one interesting observation has already come out of this: I
> confirmed that `rustc` indeed uses `-ffunction-sections` and
> `-fdata-sections` on all platforms except for macOS. When trying out
> different linkers for a small test case [1], however, I found that
> there were rather large differences in execution time:
>
> ld (no PGO) = 172 ms
> ld (PGO) = 196 ms
>
> gold (no PGO) = 182 ms
> gold (PGO) = 141 ms
>
> lld (no PGO) = 193 ms
> lld (PGO) = 171 ms
>
> So `gold` and `lld` both profit from PGO quite a bit, while `ld`
> linked programs are slower with PGO. I then noticed that branch
> weights for `ld` were missing from most branches, while the counts for
> the other linkers are correct. All of this suggests to me that
> something goes wrong when `ld` tries to link in the profiling runtime.
>
> I'll be investigating further.
>
> [1]
> https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/branch_weights
>
>
> On Thu, Sep 12, 2019 at 6:31 PM Teresa Johnson <tejohnson at google.com>
> wrote:
> >
> >
> >
> > On Thu, Sep 12, 2019 at 8:18 AM Teresa Johnson <tejohnson at google.com>
> wrote:
> >>
> >> I just have a couple suggestions off the top of my head:
> >> - have you tried using the new pass manager
> (-fexperimental-new-pass-manager)? That has access to additional analysis
> info during inlining and is able to make more precise PGO based inline
> decisions.
> >
> >
> > (although note the above shouldn't make the difference between no
> performance and a typical PGO performance boost)
> >
> > Another thing I just thought of - are you using -ffunction-sections and
> -fdata-sections? These will allow for PGO based function layout in the
> linker (assuming you are using lld or gold).
> >
> >> - have you tried collecting profile data with and without PGO to see if
> you can compare where cycles are being spent? That's my usual way of
> debugging performance differences related to inlining or profile changes.
> >> - just a comment that it is odd you are getting better performance
> without the pre-inlining - which typically helps because you get better
> context-sensitive profile info. Maybe sanity check that the pre inlining is
> kicking in for both the profile gen and use passes?
> >>
> >> Teresa
> >>
> >> On Thu, Sep 12, 2019 at 2:18 AM Michael Woerister via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>> As part of my work for Mozilla's Low Level Tools team I've
> >>> implemented PGO in the Rust compiler. The feature is
> >>> available since Rust 1.37 [1]. However, so far we have not
> >>> seen any actual performance gains from enabling PGO for
> >>> Rust code. Performance even seems to drop 1-3% with PGO
> >>> enabled. I wonder why that is and I'm hoping that someone
> >>> here might have experience debugging PGO effectiveness.
> >>>
> >>>
> >>> PGO in the Rust compiler
> >>> ------------------------
> >>>
> >>> The Rust compiler uses IR-level instrumentation (the
> >>> equivalent of Clang's `-fprofile-generate`/`-fprofile-use`).
> >>> This has worked pretty well and even enables doing PGO for
> >>> mixed Rust/C++ codebases when also using Clang.
> >>>
> >>> The Rust compiler has regression tests that make sure that:
> >>>
> >>> - instrumentation shows up in LLVM IR for the `generate` phase,
> >>>   and that
> >>>
> >>> - profiling data is actually used during the `use` phase, i.e.
> >>>   that cold functions get marked with `cold` and hot functions
> >>>   get marked with `inline`.
> >>>
> >>> I also verified manually that `branch_weights` are being set
> >>> in IR. So, from my perspective, the PGO implementation does
> >>> what it is supposed to do.
> >>>
> >>> However, as already mentioned, in all benchmarks I've seen so
> >>> far performance seems to stay the same at best and often even
> >>> suffers slightly. Which is suprising because for C++ code
> >>> using Clang's version of IR-level instrumentation & PGO brings
> >>> signifcant gains (up to 5-10% from what I've seen in
> >>> benchmarks for Firefox).
> >>>
> >>> One thing we noticed early on is that disabling the
> >>> pre-inlining pass (`-disable-preinline`) seems to consistently
> >>> improve the situation for Rust code. Doing that we sometimes
> >>> see performance wins of almost 1% over not using PGO. This
> >>> again is very different to C++ where disabling this pass
> >>> causes dramatic performance loses for the Firefox benchmarks.
> >>> And 1% performance improvement is still well below
> >>> expectations, I think.
> >>>
> >>> So my questions to you are:
> >>>
> >>> - Has anybody here observed something similar while
> >>>   wokring on or with PGO?
> >>>
> >>> - Are there certain known characteristics of LLVM IR code
> >>>   that inhibit PGO's effectiveness and that IR produced by
> >>>   `rustc` might exhibit?
> >>>
> >>> - Does anybody know of a good source that describes how to
> >>>   effectively debug a problem like this?
> >>>
> >>> - Does anybody know of a small example program in C/C++
> >>>   that is known to profit from PGO and that could be
> >>>   re-implemented in Rust for comparison?
> >>>
> >>> Thanks a lot for reading! Any help is appreciated.
> >>>
> >>> -Michael
> >>>
> >>> [1]
> https://blog.rust-lang.org/2019/08/15/Rust-1.37.0.html#profile-guided-optimization
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson at google.com |
> >
> >
> >
> > --
> > Teresa Johnson | Software Engineer | tejohnson at google.com |
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190916/a97994a2/attachment.html>