<div dir="ltr"><div><div><div><div><div>The goal is to improve compile-time, so I think we should measure what's taking the most time and work from there. I'm not sure how the code coverage report was made, but that seems like it is based on execution frequency rather than time?<br><br></div>Here are some Instruments (macOS) screenshots of a time profile of a release build of opt running on a Haswell machine:<br>$ opt -O2 row_common.bc -S -o /dev/null<br><br>The input is derived from the Chrome source file attached to:<br><a href="https://bugs.llvm.org//show_bug.cgi?id=32037" target="_blank">https://bugs.llvm.org//show_bu<wbr>g.cgi?id=32037</a> <br><br>It's an implicit assumption in this experiment that 'opt' time is important to overall compile-time, but as you can see in the bug report, that's not really true for this case. Also, this profile may not be representative of other programs or important vs. other programs, but to summarize:<br><br></div><div>1. 123 ms out of 326 ms (38%) of the -O2 opt time is going into InstCombine. So yes, InstCombine is a big chunk of opt time.<br></div></div>2. Focusing on computeKnownBits() within the InstCombine subtree: > 48% of the time consumed by InstCombine is spent in ValueTracking.<br></div><div><br></div>We know that most InstCombines don't use ValueTracking, so I think transforms that use computeKnownBits are the ones that should be grouped under "ExpensiveCombines". Maybe there's another category for "RareCombines", or we split InstCombine into multiple passes to further reduce compile-time?<br><br>In any case, the profile lines up with David's comment about simplifyDemanded / computeKnownBits and previous anecdotes in the "llvm is getting slower" threads. Simple matchers are relatively short strands of compare and branch spaghetti. It's the few transforms that use ValueTracking that have a high relative cost. If we attack that, the profile says we could recover much more than 1% if -O2 middle-end time is significant for most compiles.<br></div><div><div><div><div><div><div><br></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Mar 18, 2017 at 8:43 AM, Hal Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span>
<p><br>
</p>
<div class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782moz-cite-prefix">On 03/17/2017 08:12 PM, David Majnemer
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Honestly, I'm not a huge fan of this change as-is.
The set of transforms that were added behind ExpensiveChecks
seems awfully strange and many would not lead the reader to
believe that they are expensive at all
(the SimplifyDemandedInstructi<wbr>onBits and foldICmpUsingKnownBits
calls being the obvious expensive routines).
<div><br>
</div>
<div>The purpose of many of InstCombine's xforms is to
canonicalize the IR to make life easier for downstream passes
and analyses.</div>
<div><br>
</div>
<div>InstCombine internally relies on knowing what transforms it
may or may not perform. This is important: canonicalizations
may duel endlessly if we get this wrong; the order of the
combines is also important for exactly the same reason
(SelectionDAG deals with this problem in a different way with
its pattern complexity field).</div>
<div><br>
</div>
<div>Another concern with moving seemingly arbitrary combines
under ExpensiveCombines is that it will make it that much
harder to understand what is and is not canonical at a given
point during the execution of the optimizer.</div>
</div>
</blockquote>
<br></span>
I agree with this up to a point. If we have these kinds of
canonicalization dependencies that depend on ValueTracking's depth,
this seems very fragile. Even if we introduce caching, thus making
the depth often infinite, if it will still be finite in the face of
updates then we still need to be careful (plus, if we're worried
about being able to understand the canonical form, then depending on
known-bits analysis can make defining this form subtle).<span><br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I'd be much more interested in a patch which caches the
result of frequently called ValueTracking functionality like
ComputeKnownBits, ComputeSignBit, etc. which often doesn't
change but is not intelligently reused. I imagine that the
performance win might be quite comparable. Such a patch would
have the benefit of keeping the set of available transforms
constant throughout the pipeline while bringing execution time
down; I wouldn't be at all surprised if caching the
ValueTracking functions resulted in a bigger time savings.</div>
</div>
</blockquote>
<br></span>
I'd started working on this a few months ago; I didn't finish the
patch (mostly because I discovered that there's also a need to
invalidate the cache whenever to perform a transformation that drops
nsw/nuw flags and I've never got to that part), but I'm happy to
provide my work-in-progress to anyone interested. cc'ing Davide who
had also expressed an interest in this.<br>
<br>
-Hal<div><div class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264h5"><br>
<br>
<blockquote type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Mar 17, 2017 at 5:49 PM, Hal
Finkel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span>
<p><br>
</p>
<div class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423moz-cite-prefix">On
03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:<br>
</div>
<blockquote type="cite"> <br>
<div>
<blockquote type="cite">
<div>On Mar 17, 2017, at 11:50 AM, Mikhail
Zolotukhin via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423Apple-interchange-newline">
<div>
<div style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div>Hi,</div>
<div><br>
</div>
<div>One of the most time-consuming
passes in LLVM middle-end is
InstCombine (see e.g. [1]). It is a
very powerful pass capable of doing
all the crazy stuff, and new patterns
are being constantly introduced there.
The problem is that we often use it
just as a clean-up pass: it's
scheduled 6 times in the current pass
pipeline, and each time it's invoked
it checks all known patterns. It
sounds ok for O3, where we try to
squeeze as much performance as
possible, but it is too excessive for
other opt-levels. InstCombine has an
ExpensiveCombines parameter to address
that - but I think it's underused at
the moment.</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Yes, the “ExpensiveCombines” has been added
recently (4.0? 3.9?) but I believe has always been
intended to be extended the way you’re doing it.
So I support this effort :)</div>
</div>
</blockquote>
<br>
</span> +1<br>
<br>
Also, did your profiling reveal why the other combines are
expensive? Among other things, I'm curious if the
expensive ones tend to spend a lot of time in
ValueTracking (getting known bits and similar)?<br>
<br>
-Hal
<div>
<div class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782h5"><br>
<br>
<blockquote type="cite">
<div>
<div><br>
</div>
<div>CC: David for the general direction on
InstCombine though.</div>
<div><br>
</div>
<div><br>
</div>
<div>— </div>
<div>Mehdi</div>
<div><br>
</div>
<div><br>
</div>
<br>
<blockquote type="cite">
<div>
<div style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div><br>
</div>
<div>Trying to find out, which
patterns are important, and which
are rare, I profiled clang using
CTMark and got the following
coverage report:</div>
</div>
</div>
</div>
</div>
<span id="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423cid:CEA3012D-A9E2-4318-AA90-C372667C70A9@apple.com"><InstCombine_covreport.html></span>
<div style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div>
<div>(beware, the file is ~6MB).</div>
</div>
<div><br>
</div>
<div>Guided by this profile I moved
some patterns under the "if
(ExpensiveCombines)" check, which
expectedly happened to be neutral
for runtime performance, but
improved compile-time. The testing
results are below (measured for Os).</div>
<div><br>
</div>
<div>
<table style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px solid black">
<thead><tr>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px;width:500px">Performance
Improvements - Compile Time</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Δ </th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Previous</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Current</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">σ </th>
</tr>
</thead><tbody class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423searchable">
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank">CTMark/sqlite3/sqlite3</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(200,255,200)">-1.55%</td>
<td style="padding:5px 5px 5px 8px">6.8155</td>
<td style="padding:5px 5px 5px 8px">6.7102</td>
<td style="padding:5px 5px 5px 8px">0.0081</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2" target="_blank">CTMark/mafft/pairlocalalign</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(209,255,209)">-1.05%</td>
<td style="padding:5px 5px 5px 8px">8.0407</td>
<td style="padding:5px 5px 5px 8px">7.9559</td>
<td style="padding:5px 5px 5px 8px">0.0193</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank">CTMark/ClamAV/clamscan</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.02%</td>
<td style="padding:5px 5px 5px 8px">11.3893</td>
<td style="padding:5px 5px 5px 8px">11.2734</td>
<td style="padding:5px 5px 5px 8px">0.0081</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2" target="_blank">CTMark/lencod/lencod</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">12.8763</td>
<td style="padding:5px 5px 5px 8px">12.7461</td>
<td style="padding:5px 5px 5px 8px">0.0244</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2" target="_blank">CTMark/SPASS/SPASS</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">12.5048</td>
<td style="padding:5px 5px 5px 8px">12.3791</td>
<td style="padding:5px 5px 5px 8px">0.0340</td>
</tr>
</tbody>
</table>
<div><br>
</div>
<table style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px solid black">
<thead><tr>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px;width:500px">Performance
Improvements - Compile Time</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Δ </th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Previous</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">Current</th>
<th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px">σ </th>
</tr>
</thead><tbody class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423searchable">
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2" target="_blank">External/SPEC/CINT2006/403.gcc<wbr>/403.gcc</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(199,255,199)">-1.64%</td>
<td style="padding:5px 5px 5px 8px">54.0801</td>
<td style="padding:5px 5px 5px 8px">53.1930</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank">External/SPEC/CINT2006/400.per<wbr>lbench/400.perlbench</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(205,255,205)">-1.25%</td>
<td style="padding:5px 5px 5px 8px">19.1481</td>
<td style="padding:5px 5px 5px 8px">18.9091</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
<tr>
<td class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank">External/SPEC/CINT2006/445.gob<wbr>mk/445.gobmk</a></td>
<td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)">-1.01%</td>
<td style="padding:5px 5px 5px 8px">15.2819</td>
<td style="padding:5px 5px 5px 8px">15.1274</td>
<td style="padding:5px 5px 5px 8px">-</td>
</tr>
</tbody>
</table>
<div><br>
</div>
<div><br>
</div>
<div>Do such changes make sense? The
patch doesn't change O3, but it
does change Os and potentially can
change performance there (though I
didn't see any changes in my
tests).</div>
</div>
<div><br>
</div>
<div>The patch is attached for the
reference, if we decide to go for
it, I'll upload it to phab:</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<span id="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423cid:2A77C0D9-EE12-4C99-99A0-7A0CF5DF758A@apple.com"><0001-InstCombine-Move-some-in<wbr>frequent-patterns-under-if-E.p<wbr>atch></span>
<div style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div dir="auto" style="word-wrap:break-word">
<div><br>
</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Michael</div>
<div><br>
</div>
<div>
<div>[1]: <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html" target="_blank">http://lists.llvm.org/pip<wbr>ermail/llvm-dev/2016-December/<wbr>108279.html</a></div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</div>
</blockquote>
</div>
<br>
<br>
<fieldset class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
LLVM Developers mailing list
<a class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</div></div><span class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782HOEnZb"><font color="#888888"><pre class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782m_-5665328138797978423moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</font></span></div>
______________________________<wbr>_________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a>
</blockquote></div>
</div>
</blockquote>
<pre class="m_-6764615056291505949m_-7239098240301919299m_-2421039467585570264m_-2192062852936131782moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre></div></div></div><br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>