[llvm-dev] RFC: PGO Late instrumentation for LLVM

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 2 12:26:59 PDT 2015


On Wed, Sep 2, 2015 at 12:10 PM, Ivan Baev <ibaev at codeaurora.org> wrote:

> > On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >> > Date: Tue, 1 Sep 2015 14:21:16 -0700
> >> > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org>
> >> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com>
> >> Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM
> >> >>>> *(2) Performance impact of context sensitivity*
> >> >>>> LLVM does not use the profile information fully in the back-end
> >> optimizations, for instance, inlining does not fully use the profile
> counts
> >> >>>> -- it only marks hot/cold function attribute based on function
> >> entry
> >> counts. To evaluate the impact of profile context sensitivity, GCC is
> used
> >> >>>> in the experiment. Note that GCC PGO improves clang performance a
> >> lot
> >> more
> >> >>>> than clang PGO.
> >> >>>> First we summarize the methodology used in the experiment: 0)
> build clang with GCC O2 without early inlining and measure
> >> clang's
> >> >>>> performance. GCC early inlining (einline) is similar to pre-inline
> >> used by
> >> >>>> late instrumentation.
> >> >>>> 1) build clang with GCC O2 with early inlining and measure
> performance.
> >> >>>> The performance difference of 1) and 0) is denoted as E which
> >> measures
> >> >>>> the contribution of early inlining.
> >> >>>> 2) build clang with GCC O2 + PGO without early inlining.
> >> >>>> 3) build clang with GCC O2 + PGO with early inlining.
> >> >>>> The performance difference of 3) and 2) is denoted as EC. It
> constitutes
> >> >>>> roughly two parts a) early inlining contribution b) context
> >> sensitive
> >> profiling enabled with early inlining.
> >> >>>> The contribution of context sensitive profiling can be estimated
> by
> >> EC
> >> >>>> -
> >> >>>> E above.
> >>
> -------------------------------------------------------------------------------
> Config                        wall_time_for_use  speedup_vs_(0)
> >> >>>>  speedup_vs_(1)
> >> >>>> (0) base w/o einline             84.946            1.000
> >> 0.934
> >> >>>> (1) base O2                      79.310            1.071
> >> 1.000
> >> >>>> (2) profile-arcs w/o einline     63.518            1.337
> >> 1.249
> >> >>>> (3) profile-arcs                 48.364            1.756
> >> 1.640
> >> >>>> We see the following:
> >> >>>> 1) GCC PGO with early inlining improves clang performance by 64.0%
> >> (v.s.
> >> >>>> base O2 w/ early inline).
> >> >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7%
> >> (v.s.
> >> >>>> base O2 w/o early inline).
> >> >>>> 3) Early inlining performance contribution is about 7.1%.
> >> >>>> 4) Profile context sensitivity contribution is estimated to be
> >> 22.2%
> >> (i.e. 64.0% -33.7% - 7.1%), which is pretty significant.
> >> Rong,
> >> Sorry for the late response. Just wanted to clarify my understanding of
> data in (2) Performance impact of context sensitivity.
> >> On clang as an application:
> >> 3) Early inlining contribution is about 7.1%,
> > This is the effect of pre-inlining without profile guidance.
> >> 2) PGO w/o early inlining contribution is about 33.7%,
> >> 4) so the additional combined effect of 2 and 3 is about 22.2%,
> correct?
> > Not combined effect -- but remaining effect (by excluding 2 and 3)
> >> In other words, just avoiding inlining small/simple callees and
> updating
> >> their profile counts in the call graph by the main inliner - all
> through
> >> the use of early inlining - improves clang performance by 22.2%.
> > Not sure what you mean here. 22% is the estimate of the effect of CS
> profile due to clones of profile counters during instrumentation
> (through
> > pre-inlining). Profile update with inlining always exist including in
> 2).
>
> If we compare times for:
> (2) profile-arcs w/o einline - 63.518 secs, v.s.
> (3) profile-arcs - 48.364 secs,
> we get about 31.3% improvement due to early inline with PGO.
>
> If we compare times for:
> (0) base w/o einline - 84.946, v.s.
> (1) base O2 - 79.310.
> we get about 7.1% improvement due to early inline without PGO.
>
> What can we attribute the difference of 24.2% (31.3 - 7.1) to?
> 31.3% is the total contribution of early inline with PGO.
> Is 24.2% the context-sensitivity part of it, meaning that the profile
> counts in the call graph are more precise duing the inlining process,
> inlining decisions are better, etc.?
>

yes -- that is it.

David


>
> Ivan
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150902/1b3587ad/attachment.html>


More information about the llvm-dev mailing list