[llvm-dev] RFC: PGO Late instrumentation for LLVM

Ivan Baev via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 2 12:10:19 PDT 2015


> On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> > Date: Tue, 1 Sep 2015 14:21:16 -0700
>> > From: Rong Xu via llvm-dev <llvm-dev at lists.llvm.org>
>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>, David Li <davidxl at google.com>
>> Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM
>> >>>> *(2) Performance impact of context sensitivity*
>> >>>> LLVM does not use the profile information fully in the back-end
>> optimizations, for instance, inlining does not fully use the profile
counts
>> >>>> -- it only marks hot/cold function attribute based on function
>> entry
>> counts. To evaluate the impact of profile context sensitivity, GCC is
used
>> >>>> in the experiment. Note that GCC PGO improves clang performance a
>> lot
>> more
>> >>>> than clang PGO.
>> >>>> First we summarize the methodology used in the experiment: 0)
build clang with GCC O2 without early inlining and measure
>> clang's
>> >>>> performance. GCC early inlining (einline) is similar to pre-inline
>> used by
>> >>>> late instrumentation.
>> >>>> 1) build clang with GCC O2 with early inlining and measure
performance.
>> >>>> The performance difference of 1) and 0) is denoted as E which
>> measures
>> >>>> the contribution of early inlining.
>> >>>> 2) build clang with GCC O2 + PGO without early inlining.
>> >>>> 3) build clang with GCC O2 + PGO with early inlining.
>> >>>> The performance difference of 3) and 2) is denoted as EC. It
constitutes
>> >>>> roughly two parts a) early inlining contribution b) context
>> sensitive
>> profiling enabled with early inlining.
>> >>>> The contribution of context sensitive profiling can be estimated
by
>> EC
>> >>>> -
>> >>>> E above.
>> -------------------------------------------------------------------------------
Config                        wall_time_for_use  speedup_vs_(0)
>> >>>>  speedup_vs_(1)
>> >>>> (0) base w/o einline             84.946            1.000
>> 0.934
>> >>>> (1) base O2                      79.310            1.071
>> 1.000
>> >>>> (2) profile-arcs w/o einline     63.518            1.337
>> 1.249
>> >>>> (3) profile-arcs                 48.364            1.756
>> 1.640
>> >>>> We see the following:
>> >>>> 1) GCC PGO with early inlining improves clang performance by 64.0%
>> (v.s.
>> >>>> base O2 w/ early inline).
>> >>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7%
>> (v.s.
>> >>>> base O2 w/o early inline).
>> >>>> 3) Early inlining performance contribution is about 7.1%.
>> >>>> 4) Profile context sensitivity contribution is estimated to be
>> 22.2%
>> (i.e. 64.0% -33.7% - 7.1%), which is pretty significant.
>> Rong,
>> Sorry for the late response. Just wanted to clarify my understanding of
data in (2) Performance impact of context sensitivity.
>> On clang as an application:
>> 3) Early inlining contribution is about 7.1%,
> This is the effect of pre-inlining without profile guidance.
>> 2) PGO w/o early inlining contribution is about 33.7%,
>> 4) so the additional combined effect of 2 and 3 is about 22.2%,
correct?
> Not combined effect -- but remaining effect (by excluding 2 and 3)
>> In other words, just avoiding inlining small/simple callees and
updating
>> their profile counts in the call graph by the main inliner - all
through
>> the use of early inlining - improves clang performance by 22.2%.
> Not sure what you mean here. 22% is the estimate of the effect of CS
profile due to clones of profile counters during instrumentation
(through
> pre-inlining). Profile update with inlining always exist including in
2).

If we compare times for:
(2) profile-arcs w/o einline - 63.518 secs, v.s.
(3) profile-arcs - 48.364 secs,
we get about 31.3% improvement due to early inline with PGO.

If we compare times for:
(0) base w/o einline - 84.946, v.s.
(1) base O2 - 79.310.
we get about 7.1% improvement due to early inline without PGO.

What can we attribute the difference of 24.2% (31.3 - 7.1) to?
31.3% is the total contribution of early inline with PGO.
Is 24.2% the context-sensitivity part of it, meaning that the profile
counts in the call graph are more precise duing the inlining process,
inlining decisions are better, etc.?

Ivan











More information about the llvm-dev mailing list