<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 17, 2017, at 6:12 PM, David Majnemer via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Honestly, I'm not a huge fan of this change as-is. The set of transforms that were added behind ExpensiveChecks seems awfully strange and many would not lead the reader to believe that they are expensive at all (the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls being the obvious expensive routines).<div class=""><br class=""></div><div class="">The purpose of many of InstCombine's xforms is to canonicalize the IR to make life easier for downstream passes and analyses.</div></div></div></blockquote><div><br class=""></div>As we get further along with compile-time improvements one question we need to ask ourselves more frequently is about the effectiveness of optimizations/passes. For example -  in this case - how can we make an educated assessment that running the combiner N times is a good cost/benefit investment of compute resources? The questions below are meant to figure out what technologies/instrumentations/etc could help towards a more data-driven decision process when it comes to the effectiveness of optimizations. Instcombiner might just be an inspirational use case to see what is possible in that direction.<br class=""><div><br class=""></div>The combiner is invoked in full multiple times. But is it really necessary to run all of it for that purpose? After instcombine is run once is there a mapping from transformation -> combines? I suspect most transformations could invoke a subset of combines to re-canonicalize. Or, if there was a (cheap) verifier for canonical IR, it could invoke a specific canonicalization routine. Instrumenting the instcombiner and checking which patterns actually kick in (for different invocations)  might give insight into how the combiner could be structured and so that only a subset of pattern need to be checked.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">InstCombine internally relies on knowing what transforms it may or may not perform. This is important: canonicalizations may duel endlessly if we get this wrong; the order of the combines is also important for exactly the same reason (SelectionDAG deals with this problem in a different way with its pattern complexity field).</div></div></div></blockquote><div><br class=""></div>Can you elaborate on this “duel endlessly” with specific examples? This is out of curiosity. There must be verifiers that check that this cannot happen. Or an implementation strategy that guarantees that. Global isel will run into the same/similar question when it gets far enough to replace SD.<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Another concern with moving seemingly arbitrary combines under ExpensiveCombines is that it will make it that much harder to understand what is and is not canonical at a given point during the execution of the optimizer.</div></div></div></blockquote><div><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">I'd be much more interested in a patch which caches the result of frequently called ValueTracking functionality like ComputeKnownBits, ComputeSignBit, etc. which often doesn't change but is not intelligently reused. I imagine that the performance win might be quite comparable.</div></div></div></blockquote><div><br class=""></div>Can you back this up with measurements? Caching schemes are tricky. Is there a way to evaluate when the results of ComputeKnownBits etc is actually effective meaining the result is used and gives faster instructions? E.g. it might well be that only the first instance of inst_combine benefits from computing the bits. </div><div><br class=""></div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""> Such a patch would have the benefit of keeping the set of available transforms constant throughout the pipeline while bringing execution time down; I wouldn't be at all surprised if caching the ValueTracking functions resulted in a bigger time savings.</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000" class=""><span class=""><p class=""><br class="">
    </p>
    <div class="m_-5665328138797978423moz-cite-prefix">On 03/17/2017 04:30 PM, Mehdi Amini via
      llvm-dev wrote:<br class="">
    </div>
    <blockquote type="cite" class="">
      
      <br class="">
      <div class="">
        <blockquote type="cite" class="">
          <div class="">On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin
            via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>>
            wrote:</div>
          <br class="m_-5665328138797978423Apple-interchange-newline">
          <div class="">
            <div style="word-wrap:break-word" class="">
              <div dir="auto" style="word-wrap:break-word" class="">
                <div dir="auto" style="word-wrap:break-word" class="">
                  <div dir="auto" style="word-wrap:break-word" class="">
                    <div class="">Hi,</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">One of the most time-consuming passes
                      in LLVM middle-end is InstCombine (see e.g. [1]).
                      It is a very powerful pass capable of doing all
                      the crazy stuff, and new patterns are being
                      constantly introduced there. The problem is that
                      we often use it just as a clean-up pass: it's
                      scheduled 6 times in the current pass pipeline,
                      and each time it's invoked it checks all known
                      patterns. It sounds ok for O3, where we try to
                      squeeze as much performance as possible, but it is
                      too excessive for other opt-levels. InstCombine
                      has an ExpensiveCombines parameter to address that
                      - but I think it's underused at the moment.</div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </blockquote>
        <div class=""><br class="">
        </div>
        <div class="">Yes, the “ExpensiveCombines” has been added recently (4.0?
          3.9?) but I believe has always been intended to be extended
          the way you’re doing it. So I support this effort :)</div>
      </div>
    </blockquote>
    <br class=""></span>
    +1<br class="">
    <br class="">
    Also, did your profiling reveal why the other combines are
    expensive? Among other things, I'm curious if the expensive ones
    tend to spend a lot of time in ValueTracking (getting known bits and
    similar)?<br class="">
    <br class="">
     -Hal<div class=""><div class="h5"><br class="">
    <br class="">
    <blockquote type="cite" class="">
      <div class="">
        <div class=""><br class="">
        </div>
        <div class="">CC: David for the general direction on InstCombine though.</div>
        <div class=""><br class="">
        </div>
        <div class=""><br class="">
        </div>
        <div class="">— </div>
        <div class="">Mehdi</div>
        <div class=""><br class="">
        </div>
        <div class=""><br class="">
        </div>
        <br class="">
        <blockquote type="cite" class="">
          <div class="">
            <div style="word-wrap:break-word" class="">
              <div dir="auto" style="word-wrap:break-word" class="">
                <div dir="auto" style="word-wrap:break-word" class="">
                  <div dir="auto" style="word-wrap:break-word" class="">
                    <div class=""><br class="">
                    </div>
                    <div class="">Trying to find out, which patterns are
                      important, and which are rare, I profiled clang
                      using CTMark and got the following coverage
                      report:</div>
                  </div>
                </div>
              </div>
            </div>
            <span id="m_-5665328138797978423cid:CEA3012D-A9E2-4318-AA90-C372667C70A9@apple.com" class=""><InstCombine_covreport.html></span>
            <div style="word-wrap:break-word" class="">
              <div dir="auto" style="word-wrap:break-word" class="">
                <div dir="auto" style="word-wrap:break-word" class="">
                  <div dir="auto" style="word-wrap:break-word" class="">
                    <div class="">
                      <div class="">(beware, the file is ~6MB).</div>
                    </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">Guided by this profile I moved some
                      patterns under the "if (ExpensiveCombines)" check,
                      which expectedly happened to be neutral for
                      runtime performance, but improved compile-time.
                      The testing results are below (measured for Os).</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">
                      <table style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px solid black" class="">
                        <thead class=""><tr class="">
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px;width:500px" class="">Performance Improvements
                              - Compile Time</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Δ </th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Previous</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Current</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">σ </th>
                          </tr>
                        </thead><tbody class="m_-5665328138797978423searchable">
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank" class="">CTMark/sqlite3/sqlite3</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(200,255,200)" class="">-1.55%</td>
                            <td style="padding:5px 5px 5px 8px" class="">6.8155</td>
                            <td style="padding:5px 5px 5px 8px" class="">6.7102</td>
                            <td style="padding:5px 5px 5px 8px" class="">0.0081</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2" target="_blank" class="">CTMark/mafft/pairlocalalign</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(209,255,209)" class="">-1.05%</td>
                            <td style="padding:5px 5px 5px 8px" class="">8.0407</td>
                            <td style="padding:5px 5px 5px 8px" class="">7.9559</td>
                            <td style="padding:5px 5px 5px 8px" class="">0.0193</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank" class="">CTMark/ClamAV/clamscan</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)" class="">-1.02%</td>
                            <td style="padding:5px 5px 5px 8px" class="">11.3893</td>
                            <td style="padding:5px 5px 5px 8px" class="">11.2734</td>
                            <td style="padding:5px 5px 5px 8px" class="">0.0081</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2" target="_blank" class="">CTMark/lencod/lencod</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)" class="">-1.01%</td>
                            <td style="padding:5px 5px 5px 8px" class="">12.8763</td>
                            <td style="padding:5px 5px 5px 8px" class="">12.7461</td>
                            <td style="padding:5px 5px 5px 8px" class="">0.0244</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2" target="_blank" class="">CTMark/SPASS/SPASS</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)" class="">-1.01%</td>
                            <td style="padding:5px 5px 5px 8px" class="">12.5048</td>
                            <td style="padding:5px 5px 5px 8px" class="">12.3791</td>
                            <td style="padding:5px 5px 5px 8px" class="">0.0340</td>
                          </tr>
                        </tbody>
                      </table>
                      <div class=""><br class="">
                      </div>
                      <table style="font-family:Helvetica,sans-serif;font-size:9pt;border-spacing:0px;border:1px solid black" class="">
                        <thead class=""><tr class="">
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px;width:500px" class="">Performance Improvements
                              - Compile Time</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Δ </th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Previous</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">Current</th>
                            <th style="background-color:rgb(238,238,238);color:rgb(102,102,102);text-align:center;font-family:Verdana;padding:5px 5px 5px 8px" class="">σ </th>
                          </tr>
                        </thead><tbody class="m_-5665328138797978423searchable">
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2" target="_blank" class="">External/SPEC/CINT2006/403.<wbr class="">gcc/403.gcc</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(199,255,199)" class="">-1.64%</td>
                            <td style="padding:5px 5px 5px 8px" class="">54.0801</td>
                            <td style="padding:5px 5px 5px 8px" class="">53.1930</td>
                            <td style="padding:5px 5px 5px 8px" class="">-</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2" target="_blank" class="">External/SPEC/CINT2006/400.<wbr class="">perlbench/400.perlbench</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(205,255,205)" class="">-1.25%</td>
                            <td style="padding:5px 5px 5px 8px" class="">19.1481</td>
                            <td style="padding:5px 5px 5px 8px" class="">18.9091</td>
                            <td style="padding:5px 5px 5px 8px" class="">-</td>
                          </tr>
                          <tr class="">
                            <td class="m_-5665328138797978423benchmark-name" style="padding:5px 5px 5px 8px"><a href="http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2" target="_blank" class="">External/SPEC/CINT2006/445.<wbr class="">gobmk/445.gobmk</a></td>
                            <td style="padding:5px 5px 5px 8px;background-color:rgb(210,255,210)" class="">-1.01%</td>
                            <td style="padding:5px 5px 5px 8px" class="">15.2819</td>
                            <td style="padding:5px 5px 5px 8px" class="">15.1274</td>
                            <td style="padding:5px 5px 5px 8px" class="">-</td>
                          </tr>
                        </tbody>
                      </table>
                      <div class=""><br class="">
                      </div>
                      <div class=""><br class="">
                      </div>
                      <div class="">Do such changes make sense? The
                        patch doesn't change O3, but it does change Os
                        and potentially can change performance there
                        (though I didn't see any changes in my tests).</div>
                    </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">The patch is attached for the
                      reference, if we decide to go for it, I'll upload
                      it to phab:</div>
                    <div class=""><br class="">
                    </div>
                  </div>
                </div>
              </div>
            </div>
            <span id="m_-5665328138797978423cid:2A77C0D9-EE12-4C99-99A0-7A0CF5DF758A@apple.com" class=""><0001-InstCombine-Move-some-<wbr class="">infrequent-patterns-under-if-<wbr class="">E.patch></span>
            <div style="word-wrap:break-word" class="">
              <div dir="auto" style="word-wrap:break-word" class="">
                <div dir="auto" style="word-wrap:break-word" class="">
                  <div dir="auto" style="word-wrap:break-word" class="">
                    <div class=""><br class="">
                    </div>
                    <div class=""><br class="">
                    </div>
                    <div class="">Thanks,</div>
                    <div class="">Michael</div>
                    <div class=""><br class="">
                    </div>
                    <div class="">
                      <div class="">[1]: <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html" target="_blank" class="">http://lists.llvm.org/<wbr class="">pipermail/llvm-dev/2016-<wbr class="">December/108279.html</a></div>
                    </div>
                    <div class=""><br class="">
                    </div>
                  </div>
                </div>
              </div>
            </div>
            ______________________________<wbr class="">_________________<br class="">
            LLVM Developers mailing list<br class="">
            <a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class="">
            <a class="m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">
          </div>
        </blockquote>
      </div>
      <br class="">
      <br class="">
      <fieldset class="m_-5665328138797978423mimeAttachmentHeader"></fieldset>
      <br class="">
      <pre class="">______________________________<wbr class="">_________________
LLVM Developers mailing list
<a class="m_-5665328138797978423moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a class="m_-5665328138797978423moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br class="">
    </div></div><span class="HOEnZb"><font color="#888888" class=""><pre class="m_-5665328138797978423moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </font></span></div>

<br class="">______________________________<wbr class="">_________________<br class="">
LLVM Developers mailing list<br class="">
<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">
<br class=""></blockquote></div><br class=""></div>
_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></body></html>