<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I'm concerned that we're focusing on one side of this.  Let me
      point out a few concerns w/changing the canonical form here:</p>
    <ol>
      <li>LICM does not know how to hoist or sink regions.  It does know
        how to hoist and sink selects.</li>
      <li>InstCombine has limited support for triangles/diamonds, but
        fairly extensive support for selects.</li>
      <li>EarlyCSE and GVN do not know how to eliminate fully redundant
        triangles/diamonds.  PRE *may* get some of these cases, but that
        is by no means guaranteed or likely to be robust.</li>
    </ol>
    <p>My personal opinion is that selects are the appropriate canonical
      form.  </p>
    <p>For the one of the specific cases mentioned, teaching GVN to do
      FRE and PRE for loads from selects of pointers just doesn't seem
      that painful.  I would be really tempted to do that instead. 
      Similarly, walking facts back from select uses in CVP seems
      generally useful since we have use dependent facts in a bunch of
      contexts, not just selects.  (Call arguments for instance, 
      non-null from unconditional deref, etc..)<br>
    </p>
    <p>To be clear, I am raising concerns, not actively objecting to
      this.  If you want to move forward and commit to work through all
      of the issues identified I wont actively stand in the way.</p>
    <p>Philip<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 08/14/2018 12:39 PM, Sanjay Patel
      via llvm-dev wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CA+wODitU1d+Fme24EhhwNkX6tcQzpiRnrKCpeej11ELgt5heig@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div>I didn't look closely at the new patch, but this appears to
          be a small extension to:</div>
        <div><a href="https://reviews.llvm.org/D38566"
            moz-do-not-send="true">https://reviews.llvm.org/D38566</a></div>
        <div>...and the GVN-based reasons for delaying transformation to
          'select' are discussed in detail in the motivating bug for
          that patch:</div>
        <div><a href="https://bugs.llvm.org/show_bug.cgi?id=34603"
            moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=34603</a></div>
        <div><br>
        </div>
        <div>So this sounds like the right direction to me. Note that
          there was objection to the implementation (a pile of pass
          options vs. uniquely named passes).<br>
        </div>
        <div><br>
        </div>
        <div>Here's another motivating bug where early transform to
          select prevents optimization:</div>
        <div><a href="https://bugs.llvm.org/show_bug.cgi?id=36760"
            moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=36760</a><br>
        </div>
        <div><br>
        </div>
        <div>Is that case affected by this patch?<br>
        </div>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, Aug 14, 2018 at 11:17 AM, John
          Brawn via llvm-dev <span dir="ltr"><<a
              href="mailto:llvm-dev@lists.llvm.org" target="_blank"
              moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Summary<br>
            =======<br>
            <br>
            I'm planning on adjusting SimplifyCFG so that it doesn't
            turn two-entry phi<br>
            nodes into selects until later in the pass pipeline, to give
            passes which can<br>
            understand phis but not selects more opportunity to
            optimize. The thing I'm<br>
            trying to do which made me think of doing this is described
            below, but from the<br>
            benchmarking I've done it looks like this is overall a good
            idea regardless of<br>
            if I manage to get that done or not.<br>
            <br>
            Motivation<br>
            ==========<br>
            <br>
            My goal is to get clang to optimize some code containing a
            call to<br>
            std::min_element which is dereferenced, so something like:<br>
            <br>
              float min_element_example(float *data, int size)<br>
              {<br>
                return *std::min_element(data, data+size);<br>
              }<br>
            <br>
            which, after inlining a specialization, looks like:<br>
            <br>
              float min_element_example_inlined(<wbr>float *first, float
            *last)<br>
              {<br>
                for (float *p = first; p != last; ++p)<br>
                {<br>
                  if (*p < *first)<br>
                    first = p;<br>
                }<br>
                return *first;<br>
              }<br>
            <br>
            There are two loads in the loop, *p and *first, but actually
            the load *p can be<br>
            eliminated by using either the previous load *p or the
            previous *first,<br>
            depending on if the if-condition was taken or not. However
            the<br>
            "if (*p < *first) first = p" gets turned by simplifycfg
            into a select and this<br>
            makes optimizing this a lot harder because you no longer
            have distinct paths<br>
            through the CFG.<br>
            <br>
            I have some ideas on how to do the optimization (see my
            previous RFC "Making GVN<br>
            able to visit the same block more than once" posted in
            April, though I've<br>
            decided that the specific idea presented there isn't the
            right way to do it),<br>
            but I think the first step is to make sure we don't have a
            select when we try<br>
            to optimise this.<br>
            <br>
            Approach<br>
            ========<br>
            <br>
            I've posted a patch to <a
              href="https://reviews.llvm.org/D50723" rel="noreferrer"
              target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D50723</a>
            showing what I'm<br>
            intending to do. An extra parameter is added to SimplifyCFG
            to control whether<br>
            two-entry phi nodes are converted into select, and this is
            set to false in all<br>
            instances before the end of module simplification. At the
            end of module<br>
            simplification we do SimplifyCFG, then Instcombine to
            optimise the selects that<br>
            are introduced, then EarlyCSE to eliminate common
            subexpressions introduced by<br>
            instcombine.<br>
            <br>
            Benchmark Results<br>
            =================<br>
            <br>
            These are performance differences reported by LNT when
            running llvm-test-suite,<br>
            spec2000, and spec2006 at -O3 with and without the patch
            linked above (using<br>
            trunk llvm from a week or so ago).<br>
            <br>
            AArch64 results on ARM Cortex-A72:<br>
            <br>
            Performance Regressions - execution_time                   
                      Change<br>
            SingleSource/Benchmarks/<wbr>Shootout/Shootout-ary3         
                           9.48%<br>
            MultiSource/Benchmarks/TSVC/<wbr>Packing-flt/Packing-flt   
                            3.79%<br>
            SingleSource/Benchmarks/<wbr>CoyoteBench/huffbench         
                            1.40%<br>
            <br>
            Performance Improvements - execution_time                   
                     Change<br>
            MultiSource/Benchmarks/TSVC/<wbr>Searching-dbl/Searching-dbl 
                        -23.74%<br>
            External/SPEC/CINT2000/256.<wbr>bzip2/256.bzip2             
                          -9.82%<br>
            MultiSource/Benchmarks/TSVC/<wbr>Searching-flt/Searching-flt 
                         -9.57%<br>
            MultiSource/Benchmarks/TSVC/<wbr>Equivalencing-flt/<wbr>Equivalencing-flt 
                 -4.38%<br>
            MultiSource/Benchmarks/TSVC/<wbr>LinearDependence-flt/<wbr>LinearDependence-flt
            -3.94%<br>
            MultiSource/Benchmarks/TSVC/<wbr>Packing-dbl/Packing-dbl   
                           -3.44%<br>
            External/SPEC/CFP2006/453.<wbr>povray/453.povray           
                           -2.50%<br>
            SingleSource/Benchmarks/Adobe-<wbr>C++/stepanov_vector     
                           -1.49%<br>
            <br>
            X86_64 results on Intel Xeon E5-2690:<br>
            <br>
            Performance Regressions - execution_time           Change<br>
            MultiSource/Benchmarks/<wbr>Ptrdist/yacr2/yacr2         
            5.62%<br>
            <br>
            Performance Improvements - execution_time          Change<br>
            SingleSource/Benchmarks/Misc-<wbr>C++/Large/sphereflake
            -4.43%<br>
            External/SPEC/CINT2006/456.<wbr>hmmer/456.hmmer       
             -2.50%<br>
            External/SPEC/CINT2006/464.<wbr>h264ref/464.h264ref   
             -1.60%<br>
            MultiSource/Benchmarks/nbench/<wbr>nbench             
             -1.19%<br>
            SingleSource/Benchmarks/Adobe-<wbr>C++/functionobjects 
            -1.07%<br>
            <br>
            I had a brief look at the regressions and they all look to
            be caused by<br>
            getting bad luck with branch mispredictions: I looked into
            the Shootout-ary3 and<br>
            yacr2 cases and in both the hot code path was the same with
            and without the<br>
            patch, but with more mispredictions probably caused by
            changes elsewhere.<br>
            <br>
            John<br>
            <br>
            ______________________________<wbr>_________________<br>
            LLVM Developers mailing list<br>
            <a href="mailto:llvm-dev@lists.llvm.org"
              moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
            <a
              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>