<div dir="ltr"><div><div>> To (hopefully) make it easier to answer this question, I've posted
my (work-in-progress) patch which adds a known-bits cache to
InstCombine. <br>> I rebased it yesterday, so it should be fairly easy to
apply: <a class="m_-3075486418410273754gmail-m_-1855108276665198409moz-txt-link-freetext" href="https://reviews.llvm.org/D31239" target="_blank">https://reviews.llvm.org/D3123<wbr>9</a> - Seeing what this does to
the performance of the <br>> benchmarks mentioned in this thread (among
others) would certainly be interesting.<br>
<br></div>Thanks! I only have the one rough data point based on PR32037, but caching does good things for compile-time on that example.<br></div><br>Trunk r298514 compiled release on macOS running on Haswell 4GHz:<br>$ time ./opt -O2 row_common.bc -S -o /dev/null<br><br>user 0m0.302s<br>user 0m0.300s<br>user 0m0.296s<br>user 0m0.299s<br>user 0m0.296s<br><br><div>With your patch applied:<br></div><div><div><br>user 0m0.264s<br>user 0m0.269s<br>user 0m0.269s<br>user 0m0.268s<br>user 0m0.268s<br><br></div><div>So the time for all of -O2 has dropped to 89.6% of the baseline (improvement of 11.5%). <br>A profile shows time spent in InstCombine->computeKnownBits dropped from 58 ms to 15 ms (lines up with the overall time drop), so we're about 4x faster in ValueTracking with the caching.<br></div><div>
<div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 22, 2017 at 7:36 AM, Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="m_-3075486418410273754gmail-">
<p><br>
</p>
<div class="m_-3075486418410273754gmail-m_-1855108276665198409moz-cite-prefix">On 03/20/2017 11:51 PM, Gerolf
Hoflehner wrote:<br>
</div>
<blockquote type="cite">
<br>
<div>
<blockquote type="cite">
<div>On Mar 17, 2017, at 6:12 PM, David Majnemer via
llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="m_-3075486418410273754gmail-m_-1855108276665198409Apple-interchange-newline">
<div>
<div dir="ltr">Honestly, I'm not a huge fan of this
change as-is. The set of transforms that were added behind
ExpensiveChecks seems awfully strange and many would not
lead the reader to believe that they are expensive at all
(the SimplifyDemandedInstructi<wbr>onBits and
foldICmpUsingKnownBits calls being the obvious expensive
routines).
<div><br>
</div>
<div>The purpose of many of InstCombine's xforms
is to canonicalize the IR to make life easier for
downstream passes and analyses.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
As we get further along with compile-time improvements one
question we need to ask ourselves more frequently is about the
effectiveness of optimizations/passes. For example - in this
case - how can we make an educated assessment that running the
combiner N times is a good cost/benefit investment of compute
resources? The questions below are meant to figure out what
technologies/instrumentations/<wbr>etc could help towards a more
data-driven decision process when it comes to the effectiveness
of optimizations. Instcombiner might just be an inspirational
use case to see what is possible in that direction.<br>
<div><br>
</div>
The combiner is invoked in full multiple times. But is it really
necessary to run all of it for that purpose? After instcombine
is run once is there a mapping from transformation ->
combines? I suspect most transformations could invoke a subset
of combines to re-canonicalize. Or, if there was a (cheap)
verifier for canonical IR, it could invoke a specific
canonicalization routine. Instrumenting the instcombiner and
checking which patterns actually kick in (for different
invocations) might give insight into how the combiner could be
structured and so that only a subset of pattern need to be
checked.<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div><br>
</div>
<div>InstCombine internally relies on knowing
what transforms it may or may not perform. This is
important: canonicalizations may duel endlessly if we
get this wrong; the order of the combines is also
important for exactly the same reason (SelectionDAG
deals with this problem in a different way with its
pattern complexity field).</div>
</div>
</div>
</blockquote>
<div><br>
</div>
Can you elaborate on this “duel endlessly” with specific
examples? This is out of curiosity. There must be verifiers that
check that this cannot happen. Or an implementation strategy
that guarantees that. Global isel will run into the same/similar
question when it gets far enough to replace SD.<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div><br>
</div>
<div>Another concern with moving seemingly
arbitrary combines under ExpensiveCombines is that it
will make it that much harder to understand what is and
is not canonical at a given point during the execution
of the optimizer.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<blockquote type="cite">
<div>
<div dir="ltr">
<div><br>
</div>
<div>I'd be much more interested in a patch which
caches the result of frequently called ValueTracking
functionality like ComputeKnownBits, ComputeSignBit,
etc. which often doesn't change but is not intelligently
reused. I imagine that the performance win might be
quite comparable.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
Can you back this up with measurements? Caching schemes are
tricky. Is there a way to evaluate when the results of
ComputeKnownBits etc is actually effective meaining the result
is used and gives faster instructions? E.g. it might well be
that only the first instance of inst_combine benefits from
computing the bits. <br>
</div>
</blockquote>
<br></span>
To (hopefully) make it easier to answer this question, I've posted
my (work-in-progress) patch which adds a known-bits cache to
InstCombine. I rebased it yesterday, so it should be fairly easy to
apply: <a class="m_-3075486418410273754gmail-m_-1855108276665198409moz-txt-link-freetext" href="https://reviews.llvm.org/D31239" target="_blank">https://reviews.llvm.org/D3123<wbr>9</a> - Seeing what this does to
the performance of the benchmarks mentioned in this thread (among
others) would certainly be interesting.<br>
<br>
-Hal<div><div class="m_-3075486418410273754gmail-h5"><br>
<br>
<blockquote type="cite">
<div><br>
</div>
<div><br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div> Such a patch would have the benefit of
keeping the set of available transforms constant
throughout the pipeline while bringing execution time
down; I wouldn't be at all surprised if caching the
ValueTracking functions resulted in a bigger time
savings.</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Mar 17, 2017 at 5:49 PM,
Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span>
<p><br>
</p>
<div class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423moz-cite-prefix">On
03/17/2017 04:30 PM, Mehdi Amini via llvm-dev
wrote:<br>
</div>
<blockquote type="cite"> <br>
<div>
<blockquote type="cite">
<div>On Mar 17, 2017, at 11:50 AM,
Mikhail Zolotukhin via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423Apple-interchange-newline">
<div>
<div>
<div dir="auto">
<div dir="auto">
<div dir="auto">
<div>Hi,</div>
<div><br>
</div>
<div>One of the most
time-consuming passes in LLVM
middle-end is InstCombine (see
e.g. [1]). It is a very powerful
pass capable of doing all the
crazy stuff, and new patterns
are being constantly introduced
there. The problem is that we
often use it just as a clean-up
pass: it's scheduled 6 times in
the current pass pipeline, and
each time it's invoked it checks
all known patterns. It sounds ok
for O3, where we try to squeeze
as much performance as possible,
but it is too excessive for
other opt-levels. InstCombine
has an ExpensiveCombines
parameter to address that - but
I think it's underused at the
moment.</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Yes, the “ExpensiveCombines” has
been added recently (4.0? 3.9?) but I
believe has always been intended to be
extended the way you’re doing it. So I
support this effort :)</div>
</div>
</blockquote>
<br>
</span> +1<br>
<br>
Also, did your profiling reveal why the other
combines are expensive? Among other things, I'm
curious if the expensive ones tend to spend a lot of
time in ValueTracking (getting known bits and
similar)?<br>
<br>
-Hal
<div>
<div class="m_-3075486418410273754gmail-m_-1855108276665198409h5"><br>
<br>
<blockquote type="cite">
<div>
<div><br>
</div>
<div>CC: David for the general
direction on InstCombine though.</div>
<div><br>
</div>
<div><br>
</div>
<div>— </div>
<div>Mehdi</div>
<div><br>
</div>
<div><br>
</div>
<br>
<blockquote type="cite">
<div>
<div>
<div dir="auto">
<div dir="auto">
<div dir="auto">
<div><br>
</div>
<div>Trying to find
out, which patterns are
important, and which are rare,
I profiled clang using CTMark
and got the following coverage
report:</div>
</div>
</div>
</div>
</div>
<span id="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423cid:CEA3012D-A9E2-4318-AA90-C372667C70A9@apple.com"><InstCombine_covreport.html></span>
<div>
<div dir="auto">
<div dir="auto">
<div dir="auto">
<div>
<div>(beware, the
file is ~6MB).</div>
</div>
<div><br>
</div>
<div>Guided by this
profile I moved some patterns
under the "if
(ExpensiveCombines)" check,
which expectedly happened to
be neutral for runtime
performance, but improved
compile-time. The testing
results are below (measured
for Os).</div>
<div><br>
</div>
<div>
<table style="font-family:helvetica,sans-serif;font-size:9pt;border-spacing:0px;border-width:1px;border-style:solid;border-color:black">
<thead><tr>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px;width:500px">Performance
Improvements - Compile
Time</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Δ </th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Previous</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Current</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">σ </th>
</tr>
</thead><tbody class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423searchable">
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank">CTMark/sqlite3/sqlite3</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(200,255,200)">-1.55%</td>
<td style="padding:5px 5px 5px 8px">6.8155</td>
<td style="padding:5px 5px 5px 8px">6.7102</td>
<td style="padding:5px 5px 5px 8px">0.0081</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2" target="_blank">CTMark/mafft/pairlocalalign</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(209,255,209)">-1.05%</td>
<td style="padding:5px 5px 5px 8px">8.0407</td>
<td style="padding:5px 5px 5px 8px">7.9559</td>
<td style="padding:5px 5px 5px 8px">0.0193</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank">CTMark/ClamAV/clamscan</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.02%</td>
<td style="padding:5px 5px 5px 8px">11.3893</td>
<td style="padding:5px 5px 5px 8px">11.2734</td>
<td style="padding:5px 5px 5px 8px">0.0081</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2" target="_blank">CTMark/lencod/lencod</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">12.8763</td>
<td style="padding:5px 5px 5px 8px">12.7461</td>
<td style="padding:5px 5px 5px 8px">0.0244</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2" target="_blank">CTMark/SPASS/SPASS</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">12.5048</td>
<td style="padding:5px 5px 5px 8px">12.3791</td>
<td style="padding:5px 5px 5px 8px">0.0340</td>
</tr>
</tbody>
</table>
<div><br>
</div>
<table style="font-family:helvetica,sans-serif;font-size:9pt;border-spacing:0px;border-width:1px;border-style:solid;border-color:black">
<thead><tr>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px;width:500px">Performance
Improvements - Compile
Time</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Δ </th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Previous</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">Current</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:verdana;padding:5px 5px 5px 8px">σ </th>
</tr>
</thead><tbody class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423searchable">
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2" target="_blank">External/SPEC/CINT2006/403.gcc<wbr>/403.gcc</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(199,255,199)">-1.64%</td>
<td style="padding:5px 5px 5px 8px">54.0801</td>
<td style="padding:5px 5px 5px 8px">53.1930</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank">External/SPEC/CINT2006/400.per<wbr>lbench/400.perlbench</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(205,255,205)">-1.25%</td>
<td style="padding:5px 5px 5px 8px">19.1481</td>
<td style="padding:5px 5px 5px 8px">18.9091</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
<tr>
<td class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank">External/SPEC/CINT2006/445.gob<wbr>mk/445.gobmk</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">15.2819</td>
<td style="padding:5px 5px 5px 8px">15.1274</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
</tbody>
</table>
<div><br>
</div>
<div><br>
</div>
<div>Do such changes
make sense? The patch
doesn't change O3, but it
does change Os and
potentially can change
performance there (though I
didn't see any changes in my
tests).</div>
</div>
<div><br>
</div>
<div>The patch is
attached for the reference, if
we decide to go for it, I'll
upload it to phab:</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<span id="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423cid:2A77C0D9-EE12-4C99-99A0-7A0CF5DF758A@apple.com"><0001-InstCombine-Move-some-in<wbr>frequent-patterns-under-if-E.p<wbr>atch></span>
<div>
<div dir="auto">
<div dir="auto">
<div dir="auto">
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Michael</div>
<div><br>
</div>
<div>
<div>[1]: <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html" target="_blank">http://lists.llvm.org/pip<wbr>ermail/llvm-dev/2016-December/<wbr>108279.html</a></div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</div>
</blockquote>
</div>
<br>
<br>
<fieldset class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</div></div><span class="m_-3075486418410273754gmail-m_-1855108276665198409HOEnZb"><font color="#888888"><pre class="m_-3075486418410273754gmail-m_-1855108276665198409m_-5665328138797978423moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
______________________________<wbr>_________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</blockquote></div>
</div>
______________________________<wbr>_________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="m_-3075486418410273754gmail-m_-1855108276665198409moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</div></blockquote></div>
</blockquote>
<pre class="m_-3075486418410273754gmail-m_-1855108276665198409moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></div></div></div><br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div></div></div>