[llvm-dev] RFC: PGO Late instrumentation for LLVM

Tue Sep 1 19:21:03 PDT 2015

> Date: Tue, 1 Sep 2015 14:21:16 -0700
> From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com>
Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM

>>>> *(2) Performance impact of context sensitivity*
>>>> LLVM does not use the profile information fully in the back-end
optimizations, for instance, inlining does not fully use the profile counts
>>>> -- it only marks hot/cold function attribute based on function entry
counts. To evaluate the impact of profile context sensitivity, GCC is used
>>>> in the experiment. Note that GCC PGO improves clang performance a lot
more
>>>> than clang PGO.
>>>> First we summarize the methodology used in the experiment:
>>>> 0)  build clang with GCC O2 without early inlining and measure
clang's
>>>> performance. GCC early inlining (einline) is similar to pre-inline
used by
>>>> late instrumentation.
>>>> 1) build clang with GCC O2 with early inlining and measure
>>>> performance.
>>>> The performance difference of 1) and 0) is denoted as E which
measures
>>>> the contribution of early inlining.
>>>> 2) build clang with GCC O2 + PGO without early inlining.
>>>> 3) build clang with GCC O2 + PGO with early inlining.
>>>> The performance difference of 3) and 2) is denoted as EC. It
>>>> constitutes
>>>> roughly two parts a) early inlining contribution b) context sensitive
profiling enabled with early inlining.
>>>> The contribution of context sensitive profiling can be estimated by
EC
>>>> -
>>>> E above.
>>>> -------------------------------------------------------------------------------
Config                        wall_time_for_use  speedup_vs_(0)
>>>>  speedup_vs_(1)
>>>> (0) base w/o einline             84.946            1.000
0.934
>>>> (1) base O2                      79.310            1.071
1.000
>>>> (2) profile-arcs w/o einline     63.518            1.337
1.249
>>>> (3) profile-arcs                 48.364            1.756
1.640
>>>> We see the following:
>>>> 1) GCC PGO with early inlining improves clang performance by 64.0%
(v.s.
>>>> base O2 w/ early inline).
>>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7%
(v.s.
>>>> base O2 w/o early inline).
>>>> 3) Early inlining performance contribution is about 7.1%.
>>>> 4) Profile context sensitivity contribution is estimated to be 22.2%
(i.e. 64.0% -33.7% - 7.1%), which is pretty significant.

Rong,
Sorry for the late response. Just wanted to clarify my understanding of
data in (2) Performance impact of context sensitivity.

On clang as an application:
3) Early inlining contribution is about 7.1%,
2) PGO w/o early inlining contribution is about 33.7%,

4) so the additional combined effect of 2 and 3 is about 22.2%, correct?
In other words, just avoiding inlining small/simple callees and updating
their profile counts in the call graph by the main inliner - all through
the use of early inlining - improves clang performance by 22.2%.

Thanks,
Ivan