<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=36303">https://bugs.llvm.org/show_bug.cgi?id=36303</a> Please let me know if
I can help somehow.<br>
</p>
<br>
<div class="moz-cite-prefix">On 02/08/2018 01:22 AM, Xinliang David
Li wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CALRgJCPntB9LnCpW0+wv+oB1BptpKdWo9JCL45PWrHBKE3Mjkg@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">Victor, please file a bug tracking the issue. We
can put relevant information there including test cases used in
the experiment etc.
<div><br>
</div>
<div>thanks,</div>
<div><br>
</div>
<div>David</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Feb 7, 2018 at 2:15 PM, Victor
Leschuk <span dir="ltr"><<a
href="mailto:vleschuk@accesssoftek.com" target="_blank"
moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>David, could you please clarify on which code did you
gain 10% improvement? I have run numerous tests with and
w/o this option and it looks like it has no effect on
performance (I am talking of the old 2016 sample to be
concrete). Maybe we could investigate it together? Just
tell me where to start?<br>
</p>
<div>
<div class="h5"> <br>
<div class="m_-7612381275907462952moz-cite-prefix">On
02/07/2018 02:11 AM, Xinliang David Li wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Victor, thanks for the experiment.
<div><br>
</div>
<div>My suspicion is it is due to the remaining
issues with block layout -- especially with loop
rotation (with PGO). Another problem is that
tail dup is not happening after loop rotation
which can limit the effectiveness of loop
rotation.</div>
<div><br>
</div>
<div>I tried the internal option -mllvm
-force-precise-rotation-cost and there is about
10% speedup with -fprofile-use. This option
turns on more precise cost model when computing
rotation strategy but it is not turned on by
default.</div>
<div><br>
</div>
<div>+carrot who is working on this area.</div>
<div><br>
</div>
<div>thanks,</div>
<div><br>
</div>
<div>David</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Feb 6, 2018 at
1:37 PM, Victor Leschuk <span dir="ltr"><<a
href="mailto:vleschuk@accesssoftek.com"
target="_blank" moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Hello David, thanks for detailed
response!</p>
<p>Do you have any tests that you use to
measure the PGO effectiveness? I have
tested clang version 6.0 with the same
sample that Jie Chen used in 2016 and
actually both frontend-based PGO and
IR-based make code run slower, see the
average time:</p>
<p>clang++ -O3: 3.15 sec </p>
<p>clang++ -O3 and -fprofile-instr-use:
3.160 sec<br>
</p>
<p>clang++ -O3 and -fprofile-use: 3.180 sec<br>
</p>
<p>g++ (7.3.0) -O3: 3.640 sec<br>
</p>
<p>g++ (7.3.0) -O3 and -fprofile-use: 2.92
sec</p>
<p>Do you have any idea what can be wrong?
Maybe there are some recommendations in
which cases one should use PGO with clang
and when it is better not to do it?</p>
<p>Thanks!<br>
</p>
<div>
<div class="m_-7612381275907462952h5"> <br>
<div
class="m_-7612381275907462952m_-5231669173907304757moz-cite-prefix">On
02/05/2018 09:38 AM, Xinliang David Li
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sun,
Feb 4, 2018 at 9:59 PM, Victor
Leschuk <span dir="ltr"><<a
href="mailto:vleschuk@accesssoftek.com" target="_blank"
moz-do-not-send="true">vleschuk@accesssoftek.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Hello
David!<br>
<br>
I have recently started
acquaintance with PGO in
LLVM/clang and found<br>
your e-mail thread:<br>
<a
href="http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">http://lists.llvm.org/pipermai<wbr>l/llvm-dev/2016-May/099395.htm<wbr>l</a>
. Here you<br>
posted a nice list of
optimizations that use
profiling and of those<br>
which could be using but
don't. However that thread is
about 2 years<br>
old. Could you please kindly
let me know if there were any
significant<br>
changes in this area since
that time?<br>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>Yes, there were quite some
changes since then. Here are
some of the new features:</div>
<div><br>
</div>
<div>* LLVM IR based PGO -- this
is designed to maximize
program performance. The
option to turn it on is
-fprofile-generate/-fprofile-u<wbr>se</div>
<div>* value profiling support
in PGO -- currently support
indirect call target profiling
and memcpy/memset size
profiling and optimizations</div>
<div>* Profile data is made
available for inliner to use
(enabled only for the new pass
manager:
-fexperimental-new-pass-manage<wbr>r)</div>
<div>* Profile aware LICM is
available -- implemented via a
profile driven code sinking
pass </div>
<div>* Partial inlining is made
profile aware; Graham Yu also
added support for multiple
region function outlining
(with PGO)</div>
<div>* BB layout heuristics are
tuned with PGO</div>
<div>* hotness driven function
layout optimization </div>
<div><br>
</div>
<div>There are pending work in
the following area:</div>
<div>* profile aware loop
vectorization, etc</div>
<div>* control heigh reduction
optimization (Hiroshi is
working on this)</div>
<div><br>
</div>
<div>ThinLTO also works well
with PGO.</div>
<div><br>
</div>
<div>Hope this helps.</div>
<div><br>
</div>
<div>David</div>
<div><br>
</div>
<div>
<pre style="white-space:pre-wrap;color:rgb(0,0,0);font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">><i> What I can tell you is that there are many missing ones (that can benefit
</i>from profile): such as profile aware LICM (patch pending), speculative PRE,
loop unrolling, loop peeling, auto vectorization, inlining, function
splitting, function layout, function outlinling, profile driven size
optimization, induction variable optimization/strength reduction, stringOp
specialization/optimization/in<wbr>lining, switch peeling/lowering etc. The
biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping
etc, but there should be rooms to be improvement there too.</pre>
<br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <br>
Thanks in advance!<br>
<span
class="m_-7612381275907462952m_-5231669173907304757HOEnZb"><font
color="#888888"><br>
--<br>
Best Regards,<br>
<br>
Victor Leschuk | Software
Engineer | Access Softek<br>
<br>
</font></span></blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<pre class="m_-7612381275907462952m_-5231669173907304757moz-signature" cols="72">--
Best Regards,
Victor Leschuk | Software Engineer | Access Softek</pre>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class="m_-7612381275907462952moz-signature" cols="72">--
Best Regards,
Victor Leschuk | Software Engineer | Access Softek</pre>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Best Regards,
Victor Leschuk | Software Engineer | Access Softek</pre>
</body>
</html>