<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 03/20/2017 11:51 PM, Gerolf
Hoflehner wrote:<br>
</div>
<blockquote
cite="mid:9FFC43BD-3061-4CC8-A62D-CED1E175597E@apple.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Mar 17, 2017, at 6:12 PM, David Majnemer via
llvm-dev <<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">Honestly, I'm not a huge fan of this
change as-is. The set of transforms that were added behind
ExpensiveChecks seems awfully strange and many would not
lead the reader to believe that they are expensive at all
(the SimplifyDemandedInstructionBits and
foldICmpUsingKnownBits calls being the obvious expensive
routines).
<div class=""><br class="">
</div>
<div class="">The purpose of many of InstCombine's xforms
is to canonicalize the IR to make life easier for
downstream passes and analyses.</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
As we get further along with compile-time improvements one
question we need to ask ourselves more frequently is about the
effectiveness of optimizations/passes. For example - in this
case - how can we make an educated assessment that running the
combiner N times is a good cost/benefit investment of compute
resources? The questions below are meant to figure out what
technologies/instrumentations/etc could help towards a more
data-driven decision process when it comes to the effectiveness
of optimizations. Instcombiner might just be an inspirational
use case to see what is possible in that direction.<br class="">
<div><br class="">
</div>
The combiner is invoked in full multiple times. But is it really
necessary to run all of it for that purpose? After instcombine
is run once is there a mapping from transformation ->
combines? I suspect most transformations could invoke a subset
of combines to re-canonicalize. Or, if there was a (cheap)
verifier for canonical IR, it could invoke a specific
canonicalization routine. Instrumenting the instcombiner and
checking which patterns actually kick in (for different
invocations) might give insight into how the combiner could be
structured and so that only a subset of pattern need to be
checked.<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class=""><br class="">
</div>
<div class="">InstCombine internally relies on knowing
what transforms it may or may not perform. This is
important: canonicalizations may duel endlessly if we
get this wrong; the order of the combines is also
important for exactly the same reason (SelectionDAG
deals with this problem in a different way with its
pattern complexity field).</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
Can you elaborate on this “duel endlessly” with specific
examples? This is out of curiosity. There must be verifiers that
check that this cannot happen. Or an implementation strategy
that guarantees that. Global isel will run into the same/similar
question when it gets far enough to replace SD.<br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class=""><br class="">
</div>
<div class="">Another concern with moving seemingly
arbitrary combines under ExpensiveCombines is that it
will make it that much harder to understand what is and
is not canonical at a given point during the execution
of the optimizer.</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class=""><br class="">
</div>
<div class="">I'd be much more interested in a patch which
caches the result of frequently called ValueTracking
functionality like ComputeKnownBits, ComputeSignBit,
etc. which often doesn't change but is not intelligently
reused. I imagine that the performance win might be
quite comparable.</div>
</div>
</div>
</blockquote>
<div><br class="">
</div>
Can you back this up with measurements? Caching schemes are
tricky. Is there a way to evaluate when the results of
ComputeKnownBits etc is actually effective meaining the result
is used and gives faster instructions? E.g. it might well be
that only the first instance of inst_combine benefits from
computing the bits. <br>
</div>
</blockquote>
<br>
To (hopefully) make it easier to answer this question, I've posted
my (work-in-progress) patch which adds a known-bits cache to
InstCombine. I rebased it yesterday, so it should be fairly easy to
apply: <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D31239">https://reviews.llvm.org/D31239</a> - Seeing what this does to
the performance of the benchmarks mentioned in this thread (among
others) would certainly be interesting.<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:9FFC43BD-3061-4CC8-A62D-CED1E175597E@apple.com"
type="cite">
<div><br class="">
</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">
<div dir="ltr" class="">
<div class=""> Such a patch would have the benefit of
keeping the set of available transforms constant
throughout the pipeline while bringing execution time
down; I wouldn't be at all surprised if caching the
ValueTracking functions resulted in a bigger time
savings.</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Fri, Mar 17, 2017 at 5:49 PM,
Hal Finkel via llvm-dev <span dir="ltr" class=""><<a
moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" class="">llvm-dev@lists.llvm.org</a>></span>
wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000" class=""><span
class="">
<p class=""><br class="">
</p>
<div class="m_-5665328138797978423moz-cite-prefix">On
03/17/2017 04:30 PM, Mehdi Amini via llvm-dev
wrote:<br class="">
</div>
<blockquote type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Mar 17, 2017, at 11:50 AM,
Mikhail Zolotukhin via llvm-dev <<a
moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" class="">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br
class="m_-5665328138797978423Apple-interchange-newline">
<div class="">
<div style="word-wrap:break-word" class="">
<div dir="auto"
style="word-wrap:break-word" class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div class="">Hi,</div>
<div class=""><br class="">
</div>
<div class="">One of the most
time-consuming passes in LLVM
middle-end is InstCombine (see
e.g. [1]). It is a very powerful
pass capable of doing all the
crazy stuff, and new patterns
are being constantly introduced
there. The problem is that we
often use it just as a clean-up
pass: it's scheduled 6 times in
the current pass pipeline, and
each time it's invoked it checks
all known patterns. It sounds ok
for O3, where we try to squeeze
as much performance as possible,
but it is too excessive for
other opt-levels. InstCombine
has an ExpensiveCombines
parameter to address that - but
I think it's underused at the
moment.</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Yes, the “ExpensiveCombines” has
been added recently (4.0? 3.9?) but I
believe has always been intended to be
extended the way you’re doing it. So I
support this effort :)</div>
</div>
</blockquote>
<br class="">
</span> +1<br class="">
<br class="">
Also, did your profiling reveal why the other
combines are expensive? Among other things, I'm
curious if the expensive ones tend to spend a lot of
time in ValueTracking (getting known bits and
similar)?<br class="">
<br class="">
-Hal
<div class="">
<div class="h5"><br class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">CC: David for the general
direction on InstCombine though.</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">— </div>
<div class="">Mehdi</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div class=""><br class="">
</div>
<div class="">Trying to find
out, which patterns are
important, and which are rare,
I profiled clang using CTMark
and got the following coverage
report:</div>
</div>
</div>
</div>
</div>
<span
id="m_-5665328138797978423cid:CEA3012D-A9E2-4318-AA90-C372667C70A9@apple.com"
class=""><InstCombine_covreport.html></span>
<div style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div class="">
<div class="">(beware, the
file is ~6MB).</div>
</div>
<div class=""><br class="">
</div>
<div class="">Guided by this
profile I moved some patterns
under the "if
(ExpensiveCombines)" check,
which expectedly happened to
be neutral for runtime
performance, but improved
compile-time. The testing
results are below (measured
for Os).</div>
<div class=""><br class="">
</div>
<div class="">
<table
style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px
solid black" class="">
<thead class=""><tr class="">
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px
8px;width:500px"
class="">Performance
Improvements - Compile
Time</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Δ </th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Previous</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Current</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">σ </th>
</tr>
</thead><tbody
class="m_-5665328138797978423searchable">
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2"
target="_blank"
class="">CTMark/sqlite3/sqlite3</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(200,255,200)"
class="">-1.55%</td>
<td style="padding:5px
5px 5px 8px" class="">6.8155</td>
<td style="padding:5px
5px 5px 8px" class="">6.7102</td>
<td style="padding:5px
5px 5px 8px" class="">0.0081</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2"
target="_blank"
class="">CTMark/mafft/pairlocalalign</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(209,255,209)"
class="">-1.05%</td>
<td style="padding:5px
5px 5px 8px" class="">8.0407</td>
<td style="padding:5px
5px 5px 8px" class="">7.9559</td>
<td style="padding:5px
5px 5px 8px" class="">0.0193</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2"
target="_blank"
class="">CTMark/ClamAV/clamscan</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(210,255,210)"
class="">-1.02%</td>
<td style="padding:5px
5px 5px 8px" class="">11.3893</td>
<td style="padding:5px
5px 5px 8px" class="">11.2734</td>
<td style="padding:5px
5px 5px 8px" class="">0.0081</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2"
target="_blank"
class="">CTMark/lencod/lencod</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(210,255,210)"
class="">-1.01%</td>
<td style="padding:5px
5px 5px 8px" class="">12.8763</td>
<td style="padding:5px
5px 5px 8px" class="">12.7461</td>
<td style="padding:5px
5px 5px 8px" class="">0.0244</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2"
target="_blank"
class="">CTMark/SPASS/SPASS</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(210,255,210)"
class="">-1.01%</td>
<td style="padding:5px
5px 5px 8px" class="">12.5048</td>
<td style="padding:5px
5px 5px 8px" class="">12.3791</td>
<td style="padding:5px
5px 5px 8px" class="">0.0340</td>
</tr>
</tbody>
</table>
<div class=""><br class="">
</div>
<table
style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px
solid black" class="">
<thead class=""><tr class="">
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px
8px;width:500px"
class="">Performance
Improvements - Compile
Time</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Δ </th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Previous</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">Current</th>
<th
style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px
5px 5px 8px" class="">σ </th>
</tr>
</thead><tbody
class="m_-5665328138797978423searchable">
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2"
target="_blank"
class="">External/SPEC/CINT2006/403.<wbr
class="">gcc/403.gcc</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(199,255,199)"
class="">-1.64%</td>
<td style="padding:5px
5px 5px 8px" class="">54.0801</td>
<td style="padding:5px
5px 5px 8px" class="">53.1930</td>
<td style="padding:5px
5px 5px 8px" class="">-</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2"
target="_blank"
class="">External/SPEC/CINT2006/400.<wbr
class="">perlbench/400.perlbench</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(205,255,205)"
class="">-1.25%</td>
<td style="padding:5px
5px 5px 8px" class="">19.1481</td>
<td style="padding:5px
5px 5px 8px" class="">18.9091</td>
<td style="padding:5px
5px 5px 8px" class="">-</td>
</tr>
<tr class="">
<td
class="m_-5665328138797978423benchmark-name"
style="padding:5px 5px
5px 8px"><a
moz-do-not-send="true"
href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2"
target="_blank"
class="">External/SPEC/CINT2006/445.<wbr
class="">gobmk/445.gobmk</a></td>
<td style="padding:5px
5px 5px
8px;background-color:rgb(210,255,210)"
class="">-1.01%</td>
<td style="padding:5px
5px 5px 8px" class="">15.2819</td>
<td style="padding:5px
5px 5px 8px" class="">15.1274</td>
<td style="padding:5px
5px 5px 8px" class="">-</td>
</tr>
</tbody>
</table>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Do such changes
make sense? The patch
doesn't change O3, but it
does change Os and
potentially can change
performance there (though I
didn't see any changes in my
tests).</div>
</div>
<div class=""><br class="">
</div>
<div class="">The patch is
attached for the reference, if
we decide to go for it, I'll
upload it to phab:</div>
<div class=""><br class="">
</div>
</div>
</div>
</div>
</div>
<span
id="m_-5665328138797978423cid:2A77C0D9-EE12-4C99-99A0-7A0CF5DF758A@apple.com"
class=""><0001-InstCombine-Move-some-<wbr
class="">infrequent-patterns-under-if-<wbr
class="">E.patch></span>
<div style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div dir="auto"
style="word-wrap:break-word"
class="">
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Michael</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">[1]: <a
moz-do-not-send="true"
href="http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html"
target="_blank" class="">http://lists.llvm.org/<wbr
class="">pipermail/llvm-dev/2016-<wbr
class="">December/108279.html</a></div>
</div>
<div class=""><br class="">
</div>
</div>
</div>
</div>
</div>
______________________________<wbr
class="">_________________<br class="">
LLVM Developers mailing list<br class="">
<a moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" class="">llvm-dev@lists.llvm.org</a><br
class="">
<a moz-do-not-send="true"
class="m_-5665328138797978423moz-txt-link-freetext"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
target="_blank">http://lists.llvm.org/cgi-bin/<wbr
class="">mailman/listinfo/llvm-dev</a><br
class="">
</div>
</blockquote>
</div>
<br class="">
<br class="">
<fieldset
class="m_-5665328138797978423mimeAttachmentHeader"></fieldset>
<br class="">
<pre class="">______________________________<wbr class="">_________________
LLVM Developers mailing list
<a moz-do-not-send="true" class="m_-5665328138797978423moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</div></div><span class="HOEnZb"><font class="" color="#888888"><pre class="m_-5665328138797978423moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
______________________________<wbr class="">_________________
LLVM Developers mailing list
<a moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>
<a moz-do-not-send="true" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a>
</blockquote></div>
</div>
_______________________________________________
LLVM Developers mailing list
<a moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</div></blockquote></div>
</blockquote>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></body></html>