<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Aug 15, 2018, at 10:57 PM, Hal Finkel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
  
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
  
  <div text="#000000" bgcolor="#FFFFFF" class="">
    <br class="">
    <div class="moz-cite-prefix">On 08/15/2018 02:38 PM, Philip Reames
      via llvm-dev wrote:<br class="">
    </div>
    <blockquote type="cite" cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com" class="">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><p class="">I'm concerned that we're focusing on one side of this.  Let me
        point out a few concerns w/changing the canonical form here:</p>
      <ol class="">
        <li class="">LICM does not know how to hoist or sink regions.  It does
          know how to hoist and sink selects.</li></ol></blockquote></div></div></blockquote>I think we should teach LICM to do this eventually.<br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class=""><blockquote type="cite" cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com" class=""><ol class="" start="2">
        <li class="">InstCombine has limited support for triangles/diamonds, but
          fairly extensive support for selects.</li>
        <li class="">EarlyCSE and GVN do not know how to eliminate fully
          redundant triangles/diamonds.  PRE *may* get some of these
          cases, but that is by no means guaranteed or likely to be
          robust.</li></ol></blockquote></div></div></blockquote>Agreed, we’ll need a plan to deal with these issues.<br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class=""><blockquote type="cite" cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com" class=""><ol class="" start="2">
      </ol><p class="">My personal opinion is that selects are the appropriate
        canonical form.  </p><p class="">For the one of the specific cases mentioned, teaching GVN to do
        FRE and PRE for loads from selects of pointers just doesn't seem
        that painful.  I would be really tempted to do that instead. 
        Similarly, walking facts back from select uses in CVP seems
        generally useful since we have use dependent facts in a bunch of
        contexts, not just selects.  (Call arguments for instance, 
        non-null from unconditional deref, etc..)<br class="">
      </p><p class="">To be clear, I am raising concerns, not actively objecting to
        this.  If you want to move forward and commit to work through
        all of the issues identified I wont actively stand in the way.</p>
    </blockquote>
    <br class="">
    As I've expressed in the past, I think that not using select as part
    of our canonical form is potentially a superior design. However, it
    would be a major change. In addition to the issues that Philip
    mentions, there's also the fact that we'll just have more basic
    blocks and that, in itself, could lead to an increase in compile
    time. However, working through these issues will likely leave us
    with a more-robust optimizer.<br class="">
    <br class="">
    See also: <a href="https://bugs.llvm.org/show_bug.cgi?id=34603#c19" class="">https://bugs.llvm.org/show_bug.cgi?id=34603#c19</a><br class=""></div></div></blockquote>Canonicalizing to phis is my preference too.<br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class="">
    <br class="">
     -Hal<br class="">
    <br class="">
    <blockquote type="cite" cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com" class=""><p class="">Philip<br class="">
      </p>
      <br class="">
      <div class="moz-cite-prefix">On 08/14/2018 12:39 PM, Sanjay Patel
        via llvm-dev wrote:<br class="">
      </div>
      <blockquote type="cite" cite="mid:CA+wODitU1d+Fme24EhhwNkX6tcQzpiRnrKCpeej11ELgt5heig@mail.gmail.com" class="">
        <div dir="ltr" class="">
          <div class="">I didn't look closely at the new patch, but this appears
            to be a small extension to:</div>
          <div class=""><a href="https://reviews.llvm.org/D38566" moz-do-not-send="true" class="">https://reviews.llvm.org/D38566</a></div>
          <div class="">...and the GVN-based reasons for delaying transformation
            to 'select' are discussed in detail in the motivating bug
            for that patch:</div>
          <div class=""><a href="https://bugs.llvm.org/show_bug.cgi?id=34603" moz-do-not-send="true" class="">https://bugs.llvm.org/show_bug.cgi?id=34603</a></div>
          <div class=""><br class="">
          </div>
          <div class="">So this sounds like the right direction to me. Note that
            there was objection to the implementation (a pile of pass
            options vs. uniquely named passes).<br class="">
          </div>
          <div class=""><br class="">
          </div>
          <div class="">Here's another motivating bug where early transform to
            select prevents optimization:</div>
          <div class=""><a href="https://bugs.llvm.org/show_bug.cgi?id=36760" moz-do-not-send="true" class="">https://bugs.llvm.org/show_bug.cgi?id=36760</a><br class="">
          </div>
          <div class=""><br class="">
          </div>
          <div class="">Is that case affected by this patch?<br class="">
          </div>
          <div class=""><br class="">
          </div>
        </div>
        <div class="gmail_extra"><br class="">
          <div class="gmail_quote">On Tue, Aug 14, 2018 at 11:17 AM,
            John Brawn via llvm-dev <span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" moz-do-not-send="true" class="">llvm-dev@lists.llvm.org</a>></span>
            wrote:<br class="">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">Summary<br class="">
              =======<br class="">
              <br class="">
              I'm planning on adjusting SimplifyCFG so that it doesn't
              turn two-entry phi<br class="">
              nodes into selects until later in the pass pipeline, to
              give passes which can<br class="">
              understand phis but not selects more opportunity to
              optimize. The thing I'm<br class="">
              trying to do which made me think of doing this is
              described below, but from the<br class="">
              benchmarking I've done it looks like this is overall a
              good idea regardless of<br class="">
              if I manage to get that done or not.<br class="">
              <br class="">
              Motivation<br class="">
              ==========<br class="">
              <br class="">
              My goal is to get clang to optimize some code containing a
              call to<br class="">
              std::min_element which is dereferenced, so something like:<br class="">
              <br class="">
                float min_element_example(float *data, int size)<br class="">
                {<br class="">
                  return *std::min_element(data, data+size);<br class="">
                }<br class="">
              <br class="">
              which, after inlining a specialization, looks like:<br class="">
              <br class="">
                float min_element_example_inlined(<wbr class="">float *first,
              float *last)<br class="">
                {<br class="">
                  for (float *p = first; p != last; ++p)<br class="">
                  {<br class="">
                    if (*p < *first)<br class="">
                      first = p;<br class="">
                  }<br class="">
                  return *first;<br class="">
                }<br class="">
              <br class="">
              There are two loads in the loop, *p and *first, but
              actually the load *p can be<br class="">
              eliminated by using either the previous load *p or the
              previous *first,<br class="">
              depending on if the if-condition was taken or not. However
              the<br class="">
              "if (*p < *first) first = p" gets turned by simplifycfg
              into a select and this<br class="">
              makes optimizing this a lot harder because you no longer
              have distinct paths<br class="">
              through the CFG.<br class="">
              <br class="">
              I have some ideas on how to do the optimization (see my
              previous RFC "Making GVN<br class="">
              able to visit the same block more than once" posted in
              April, though I've<br class="">
              decided that the specific idea presented there isn't the
              right way to do it),<br class="">
              but I think the first step is to make sure we don't have a
              select when we try<br class="">
              to optimise this.<br class="">
              <br class="">
              Approach<br class="">
              ========<br class="">
              <br class="">
              I've posted a patch to <a href="https://reviews.llvm.org/D50723" rel="noreferrer" target="_blank" moz-do-not-send="true" class="">https://reviews.llvm.org/<wbr class="">D50723</a>
              showing what I'm<br class="">
              intending to do. An extra parameter is added to
              SimplifyCFG to control whether<br class="">
              two-entry phi nodes are converted into select, and this is
              set to false in all<br class="">
              instances before the end of module simplification. At the
              end of module<br class="">
              simplification we do SimplifyCFG, then Instcombine to
              optimise the selects that<br class="">
              are introduced, then EarlyCSE to eliminate common
              subexpressions introduced by<br class="">
              instcombine.<br class=""></blockquote></div></div></blockquote></blockquote></div></div></blockquote>Is scheduling another simplifycfg, instcombine + earlycse pass really necessary? I’m concerned about the compile time impact.</div><div><br class=""></div><div>Cheers,</div><div>Amara<br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class=""><blockquote type="cite" cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com" class=""><blockquote type="cite" cite="mid:CA+wODitU1d+Fme24EhhwNkX6tcQzpiRnrKCpeej11ELgt5heig@mail.gmail.com" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <br class="">
              Benchmark Results<br class="">
              =================<br class="">
              <br class="">
              These are performance differences reported by LNT when
              running llvm-test-suite,<br class="">
              spec2000, and spec2006 at -O3 with and without the patch
              linked above (using<br class="">
              trunk llvm from a week or so ago).<br class="">
              <br class="">
              AArch64 results on ARM Cortex-A72:<br class="">
              <br class="">
              Performance Regressions - execution_time                 
                          Change<br class="">
              SingleSource/Benchmarks/<wbr class="">Shootout/Shootout-ary3       
                               9.48%<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">Packing-flt/Packing-flt 
                                3.79%<br class="">
              SingleSource/Benchmarks/<wbr class="">CoyoteBench/huffbench       
                                1.40%<br class="">
              <br class="">
              Performance Improvements - execution_time                 
                         Change<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">Searching-dbl/Searching-dbl 
                          -23.74%<br class="">
              External/SPEC/CINT2000/256.<wbr class="">bzip2/256.bzip2           
                              -9.82%<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">Searching-flt/Searching-flt 
                           -9.57%<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">Equivalencing-flt/<wbr class="">Equivalencing-flt 
                   -4.38%<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">LinearDependence-flt/<wbr class="">LinearDependence-flt
              -3.94%<br class="">
              MultiSource/Benchmarks/TSVC/<wbr class="">Packing-dbl/Packing-dbl 
                               -3.44%<br class="">
              External/SPEC/CFP2006/453.<wbr class="">povray/453.povray         
                               -2.50%<br class="">
              SingleSource/Benchmarks/Adobe-<wbr class="">C++/stepanov_vector   
                               -1.49%<br class="">
              <br class="">
              X86_64 results on Intel Xeon E5-2690:<br class="">
              <br class="">
              Performance Regressions - execution_time           Change<br class="">
              MultiSource/Benchmarks/<wbr class="">Ptrdist/yacr2/yacr2         
              5.62%<br class="">
              <br class="">
              Performance Improvements - execution_time          Change<br class="">
              SingleSource/Benchmarks/Misc-<wbr class="">C++/Large/sphereflake
              -4.43%<br class="">
              External/SPEC/CINT2006/456.<wbr class="">hmmer/456.hmmer       
               -2.50%<br class="">
              External/SPEC/CINT2006/464.<wbr class="">h264ref/464.h264ref   
               -1.60%<br class="">
              MultiSource/Benchmarks/nbench/<wbr class="">nbench             
               -1.19%<br class="">
              SingleSource/Benchmarks/Adobe-<wbr class="">C++/functionobjects 
              -1.07%<br class="">
              <br class="">
              I had a brief look at the regressions and they all look to
              be caused by<br class="">
              getting bad luck with branch mispredictions: I looked into
              the Shootout-ary3 and<br class="">
              yacr2 cases and in both the hot code path was the same
              with and without the<br class="">
              patch, but with more mispredictions probably caused by
              changes elsewhere.<br class="">
              <br class="">
              John<br class="">
              <br class="">
              ______________________________<wbr class="">_________________<br class="">
              LLVM Developers mailing list<br class="">
              <a href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true" class="">llvm-dev@lists.llvm.org</a><br class="">
              <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" moz-do-not-send="true" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">
            </blockquote>
          </div>
          <br class="">
        </div>
        <br class="">
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br class="">
        <pre wrap="" class="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
      </blockquote>
      <br class="">
      <!--'"--><br class="">
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br class="">
      <pre wrap="" class="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br class="">
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </div>

_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></body></html>