[llvm-dev] PGO is ineffective for Rust - but why?
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 12 14:57:00 PDT 2019
On 9/12/19 2:18 AM, Michael Woerister via llvm-dev wrote:
> Hi everyone,
>
> As part of my work for Mozilla's Low Level Tools team I've
> implemented PGO in the Rust compiler. The feature is
> available since Rust 1.37 [1]. However, so far we have not
> seen any actual performance gains from enabling PGO for
> Rust code. Performance even seems to drop 1-3% with PGO
> enabled. I wonder why that is and I'm hoping that someone
> here might have experience debugging PGO effectiveness.
>
>
> PGO in the Rust compiler
> ------------------------
>
> The Rust compiler uses IR-level instrumentation (the
> equivalent of Clang's `-fprofile-generate`/`-fprofile-use`).
> This has worked pretty well and even enables doing PGO for
> mixed Rust/C++ codebases when also using Clang.
>
> The Rust compiler has regression tests that make sure that:
>
> - instrumentation shows up in LLVM IR for the `generate` phase,
> and that
>
> - profiling data is actually used during the `use` phase, i.e.
> that cold functions get marked with `cold` and hot functions
> get marked with `inline`.
>
> I also verified manually that `branch_weights` are being set
> in IR. So, from my perspective, the PGO implementation does
> what it is supposed to do.
One thing missing here is profile guided devirtualization. That's super
significant for Java; it might be highly relevant for Rust as well.
However, I'd still expect to see *some* positive delta with what you've
got, so don't start here. Your immediate problem is likely something else.
>
> However, as already mentioned, in all benchmarks I've seen so
> far performance seems to stay the same at best and often even
> suffers slightly. Which is suprising because for C++ code
> using Clang's version of IR-level instrumentation & PGO brings
> signifcant gains (up to 5-10% from what I've seen in
> benchmarks for Firefox).
>
> One thing we noticed early on is that disabling the
> pre-inlining pass (`-disable-preinline`) seems to consistently
> improve the situation for Rust code. Doing that we sometimes
> see performance wins of almost 1% over not using PGO. This
> again is very different to C++ where disabling this pass
> causes dramatic performance loses for the Firefox benchmarks.
> And 1% performance improvement is still well below
> expectations, I think.
>
> So my questions to you are:
>
> - Has anybody here observed something similar while
> wokring on or with PGO?
>
> - Are there certain known characteristics of LLVM IR code
> that inhibit PGO's effectiveness and that IR produced by
> `rustc` might exhibit?
Have you checked to make sure *all* of your branches have weights?
Including the ones which don't directly correspond to Rust
conditionals? If you left off branch weights from range checks or
something (i.e something with a ton of occurrences) that might be
confusing the heuristics enough to explain your results.
>
> - Does anybody know of a good source that describes how to
> effectively debug a problem like this?
>
> - Does anybody know of a small example program in C/C++
> that is known to profit from PGO and that could be
> re-implemented in Rust for comparison?
>
> Thanks a lot for reading! Any help is appreciated.
>
> -Michael
>
> [1] https://blog.rust-lang.org/2019/08/15/Rust-1.37.0.html#profile-guided-optimization
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list