<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 1, 2015 at 7:21 PM, Ivan Baev via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> Date: Tue, 1 Sep 2015 14:21:16 -0700<br>

> From: Rong Xu via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>

> Cc: llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>>, David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

Subject: Re: [llvm-dev] RFC: PGO Late instrumentation for LLVM<br>

<br>

>>>> *(2) Performance impact of context sensitivity*<br>

<div><div class="h5">>>>> LLVM does not use the profile information fully in the back-end<br>

optimizations, for instance, inlining does not fully use the profile counts<br>

>>>> -- it only marks hot/cold function attribute based on function entry<br>

counts. To evaluate the impact of profile context sensitivity, GCC is used<br>

>>>> in the experiment. Note that GCC PGO improves clang performance a lot<br>

more<br>

>>>> than clang PGO.<br>

>>>> First we summarize the methodology used in the experiment:<br>

>>>> 0)  build clang with GCC O2 without early inlining and measure<br>

clang's<br>

>>>> performance. GCC early inlining (einline) is similar to pre-inline<br>

used by<br>

>>>> late instrumentation.<br>

>>>> 1) build clang with GCC O2 with early inlining and measure<br>

>>>> performance.<br>

>>>> The performance difference of 1) and 0) is denoted as E which<br>

measures<br>

>>>> the contribution of early inlining.<br>

>>>> 2) build clang with GCC O2 + PGO without early inlining.<br>

>>>> 3) build clang with GCC O2 + PGO with early inlining.<br>

>>>> The performance difference of 3) and 2) is denoted as EC. It<br>

>>>> constitutes<br>

>>>> roughly two parts a) early inlining contribution b) context sensitive<br>

profiling enabled with early inlining.<br>

>>>> The contribution of context sensitive profiling can be estimated by<br>

EC<br>

>>>> -<br>

>>>> E above.<br>

>>>> -------------------------------------------------------------------------------<br>

Config                        wall_time_for_use  speedup_vs_(0)<br>

>>>>  speedup_vs_(1)<br>

>>>> (0) base w/o einline             84.946            1.000<br>

0.934<br>

>>>> (1) base O2                      79.310            1.071<br>

1.000<br>

>>>> (2) profile-arcs w/o einline     63.518            1.337<br>

1.249<br>

>>>> (3) profile-arcs                 48.364            1.756<br>

1.640<br>

>>>> We see the following:<br>

>>>> 1) GCC PGO with early inlining improves clang performance by 64.0%<br>

(v.s.<br>

>>>> base O2 w/ early inline).<br>

>>>> 2) GCC PGO w/o early inlining improves clang performance by 33.7%<br>

(v.s.<br>

>>>> base O2 w/o early inline).<br>

>>>> 3) Early inlining performance contribution is about 7.1%.<br>

>>>> 4) Profile context sensitivity contribution is estimated to be 22.2%<br>

(i.e. 64.0% -33.7% - 7.1%), which is pretty significant.<br>

<br>

</div></div>Rong,<br>

Sorry for the late response. Just wanted to clarify my understanding of<br>

data in (2) Performance impact of context sensitivity.<br>

<br>

On clang as an application:<br>

3) Early inlining contribution is about 7.1%,<br></blockquote><div><br></div><div>This is the effect of pre-inlining without profile guidance. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2) PGO w/o early inlining contribution is about 33.7%,<br>

<br>

4) so the additional combined effect of 2 and 3 is about 22.2%, correct?<br></blockquote><div><br></div><div>Not combined effect -- but remaining effect (by excluding 2 and 3)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In other words, just avoiding inlining small/simple callees and updating<br>

their profile counts in the call graph by the main inliner - all through<br>

the use of early inlining - improves clang performance by 22.2%.<br></blockquote><div><br></div><div>Not sure what you mean here. 22% is the estimate of the effect of CS profile due to clones of profile counters during instrumentation (through pre-inlining). Profile update with inlining always exist including in 2).</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Thanks,<br>

Ivan<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</div></div></blockquote></div><br></div></div>