<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 08/15/2018 02:38 PM, Philip Reames
      via llvm-dev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <p>I'm concerned that we're focusing on one side of this.  Let me
        point out a few concerns w/changing the canonical form here:</p>
      <ol>
        <li>LICM does not know how to hoist or sink regions.  It does
          know how to hoist and sink selects.</li>
        <li>InstCombine has limited support for triangles/diamonds, but
          fairly extensive support for selects.</li>
        <li>EarlyCSE and GVN do not know how to eliminate fully
          redundant triangles/diamonds.  PRE *may* get some of these
          cases, but that is by no means guaranteed or likely to be
          robust.</li>
      </ol>
      <p>My personal opinion is that selects are the appropriate
        canonical form.  </p>
      <p>For the one of the specific cases mentioned, teaching GVN to do
        FRE and PRE for loads from selects of pointers just doesn't seem
        that painful.  I would be really tempted to do that instead. 
        Similarly, walking facts back from select uses in CVP seems
        generally useful since we have use dependent facts in a bunch of
        contexts, not just selects.  (Call arguments for instance, 
        non-null from unconditional deref, etc..)<br>
      </p>
      <p>To be clear, I am raising concerns, not actively objecting to
        this.  If you want to move forward and commit to work through
        all of the issues identified I wont actively stand in the way.</p>
    </blockquote>
    <br>
    As I've expressed in the past, I think that not using select as part
    of our canonical form is potentially a superior design. However, it
    would be a major change. In addition to the issues that Philip
    mentions, there's also the fact that we'll just have more basic
    blocks and that, in itself, could lead to an increase in compile
    time. However, working through these issues will likely leave us
    with a more-robust optimizer.<br>
    <br>
    See also: <a class="moz-txt-link-freetext" href="https://bugs.llvm.org/show_bug.cgi?id=34603#c19">https://bugs.llvm.org/show_bug.cgi?id=34603#c19</a><br>
    <br>
     -Hal<br>
    <br>
    <blockquote type="cite"
      cite="mid:7b7d457d-dbf4-d64b-428d-7e47f84c32ef@philipreames.com">
      <p>Philip<br>
      </p>
      <br>
      <div class="moz-cite-prefix">On 08/14/2018 12:39 PM, Sanjay Patel
        via llvm-dev wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:CA+wODitU1d+Fme24EhhwNkX6tcQzpiRnrKCpeej11ELgt5heig@mail.gmail.com">
        <div dir="ltr">
          <div>I didn't look closely at the new patch, but this appears
            to be a small extension to:</div>
          <div><a href="https://reviews.llvm.org/D38566"
              moz-do-not-send="true">https://reviews.llvm.org/D38566</a></div>
          <div>...and the GVN-based reasons for delaying transformation
            to 'select' are discussed in detail in the motivating bug
            for that patch:</div>
          <div><a href="https://bugs.llvm.org/show_bug.cgi?id=34603"
              moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=34603</a></div>
          <div><br>
          </div>
          <div>So this sounds like the right direction to me. Note that
            there was objection to the implementation (a pile of pass
            options vs. uniquely named passes).<br>
          </div>
          <div><br>
          </div>
          <div>Here's another motivating bug where early transform to
            select prevents optimization:</div>
          <div><a href="https://bugs.llvm.org/show_bug.cgi?id=36760"
              moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=36760</a><br>
          </div>
          <div><br>
          </div>
          <div>Is that case affected by this patch?<br>
          </div>
          <div><br>
          </div>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Aug 14, 2018 at 11:17 AM,
            John Brawn via llvm-dev <span dir="ltr"><<a
                href="mailto:llvm-dev@lists.llvm.org" target="_blank"
                moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">Summary<br>
              =======<br>
              <br>
              I'm planning on adjusting SimplifyCFG so that it doesn't
              turn two-entry phi<br>
              nodes into selects until later in the pass pipeline, to
              give passes which can<br>
              understand phis but not selects more opportunity to
              optimize. The thing I'm<br>
              trying to do which made me think of doing this is
              described below, but from the<br>
              benchmarking I've done it looks like this is overall a
              good idea regardless of<br>
              if I manage to get that done or not.<br>
              <br>
              Motivation<br>
              ==========<br>
              <br>
              My goal is to get clang to optimize some code containing a
              call to<br>
              std::min_element which is dereferenced, so something like:<br>
              <br>
                float min_element_example(float *data, int size)<br>
                {<br>
                  return *std::min_element(data, data+size);<br>
                }<br>
              <br>
              which, after inlining a specialization, looks like:<br>
              <br>
                float min_element_example_inlined(<wbr>float *first,
              float *last)<br>
                {<br>
                  for (float *p = first; p != last; ++p)<br>
                  {<br>
                    if (*p < *first)<br>
                      first = p;<br>
                  }<br>
                  return *first;<br>
                }<br>
              <br>
              There are two loads in the loop, *p and *first, but
              actually the load *p can be<br>
              eliminated by using either the previous load *p or the
              previous *first,<br>
              depending on if the if-condition was taken or not. However
              the<br>
              "if (*p < *first) first = p" gets turned by simplifycfg
              into a select and this<br>
              makes optimizing this a lot harder because you no longer
              have distinct paths<br>
              through the CFG.<br>
              <br>
              I have some ideas on how to do the optimization (see my
              previous RFC "Making GVN<br>
              able to visit the same block more than once" posted in
              April, though I've<br>
              decided that the specific idea presented there isn't the
              right way to do it),<br>
              but I think the first step is to make sure we don't have a
              select when we try<br>
              to optimise this.<br>
              <br>
              Approach<br>
              ========<br>
              <br>
              I've posted a patch to <a
                href="https://reviews.llvm.org/D50723" rel="noreferrer"
                target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D50723</a>
              showing what I'm<br>
              intending to do. An extra parameter is added to
              SimplifyCFG to control whether<br>
              two-entry phi nodes are converted into select, and this is
              set to false in all<br>
              instances before the end of module simplification. At the
              end of module<br>
              simplification we do SimplifyCFG, then Instcombine to
              optimise the selects that<br>
              are introduced, then EarlyCSE to eliminate common
              subexpressions introduced by<br>
              instcombine.<br>
              <br>
              Benchmark Results<br>
              =================<br>
              <br>
              These are performance differences reported by LNT when
              running llvm-test-suite,<br>
              spec2000, and spec2006 at -O3 with and without the patch
              linked above (using<br>
              trunk llvm from a week or so ago).<br>
              <br>
              AArch64 results on ARM Cortex-A72:<br>
              <br>
              Performance Regressions - execution_time                 
                          Change<br>
              SingleSource/Benchmarks/<wbr>Shootout/Shootout-ary3       
                               9.48%<br>
              MultiSource/Benchmarks/TSVC/<wbr>Packing-flt/Packing-flt 
                                3.79%<br>
              SingleSource/Benchmarks/<wbr>CoyoteBench/huffbench       
                                1.40%<br>
              <br>
              Performance Improvements - execution_time                 
                         Change<br>
              MultiSource/Benchmarks/TSVC/<wbr>Searching-dbl/Searching-dbl 
                          -23.74%<br>
              External/SPEC/CINT2000/256.<wbr>bzip2/256.bzip2           
                              -9.82%<br>
              MultiSource/Benchmarks/TSVC/<wbr>Searching-flt/Searching-flt 
                           -9.57%<br>
              MultiSource/Benchmarks/TSVC/<wbr>Equivalencing-flt/<wbr>Equivalencing-flt 
                   -4.38%<br>
              MultiSource/Benchmarks/TSVC/<wbr>LinearDependence-flt/<wbr>LinearDependence-flt
              -3.94%<br>
              MultiSource/Benchmarks/TSVC/<wbr>Packing-dbl/Packing-dbl 
                               -3.44%<br>
              External/SPEC/CFP2006/453.<wbr>povray/453.povray         
                               -2.50%<br>
              SingleSource/Benchmarks/Adobe-<wbr>C++/stepanov_vector   
                               -1.49%<br>
              <br>
              X86_64 results on Intel Xeon E5-2690:<br>
              <br>
              Performance Regressions - execution_time           Change<br>
              MultiSource/Benchmarks/<wbr>Ptrdist/yacr2/yacr2         
              5.62%<br>
              <br>
              Performance Improvements - execution_time          Change<br>
              SingleSource/Benchmarks/Misc-<wbr>C++/Large/sphereflake
              -4.43%<br>
              External/SPEC/CINT2006/456.<wbr>hmmer/456.hmmer       
               -2.50%<br>
              External/SPEC/CINT2006/464.<wbr>h264ref/464.h264ref   
               -1.60%<br>
              MultiSource/Benchmarks/nbench/<wbr>nbench             
               -1.19%<br>
              SingleSource/Benchmarks/Adobe-<wbr>C++/functionobjects 
              -1.07%<br>
              <br>
              I had a brief look at the regressions and they all look to
              be caused by<br>
              getting bad luck with branch mispredictions: I looked into
              the Shootout-ary3 and<br>
              yacr2 cases and in both the hot code path was the same
              with and without the<br>
              patch, but with more mispredictions probably caused by
              changes elsewhere.<br>
              <br>
              John<br>
              <br>
              ______________________________<wbr>_________________<br>
              LLVM Developers mailing list<br>
              <a href="mailto:llvm-dev@lists.llvm.org"
                moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
              <a
                href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
            </blockquote>
          </div>
          <br>
        </div>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
      </blockquote>
      <br>
      <!--'"--><br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>