<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I'm concerned that we're focusing on one side of this.  Let me

      point out a few concerns w/changing the canonical form here:</p>

    <ol>

      <li>LICM does not know how to hoist or sink regions.  It does know

        how to hoist and sink selects.</li>

      <li>InstCombine has limited support for triangles/diamonds, but

        fairly extensive support for selects.</li>

      <li>EarlyCSE and GVN do not know how to eliminate fully redundant

        triangles/diamonds.  PRE *may* get some of these cases, but that

        is by no means guaranteed or likely to be robust.</li>

    </ol>

    <p>My personal opinion is that selects are the appropriate canonical

      form.  </p>

    <p>For the one of the specific cases mentioned, teaching GVN to do

      FRE and PRE for loads from selects of pointers just doesn't seem

      that painful.  I would be really tempted to do that instead. 

      Similarly, walking facts back from select uses in CVP seems

      generally useful since we have use dependent facts in a bunch of

      contexts, not just selects.  (Call arguments for instance, 

      non-null from unconditional deref, etc..)<br>

    </p>

    <p>To be clear, I am raising concerns, not actively objecting to

      this.  If you want to move forward and commit to work through all

      of the issues identified I wont actively stand in the way.</p>

    <p>Philip<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 08/14/2018 12:39 PM, Sanjay Patel

      via llvm-dev wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CA+wODitU1d+Fme24EhhwNkX6tcQzpiRnrKCpeej11ELgt5heig@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div>I didn't look closely at the new patch, but this appears to

          be a small extension to:</div>

        <div><a href="https://reviews.llvm.org/D38566"

            moz-do-not-send="true">https://reviews.llvm.org/D38566</a></div>

        <div>...and the GVN-based reasons for delaying transformation to

          'select' are discussed in detail in the motivating bug for

          that patch:</div>

        <div><a href="https://bugs.llvm.org/show_bug.cgi?id=34603"

            moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=34603</a></div>

        <div><br>

        </div>

        <div>So this sounds like the right direction to me. Note that

          there was objection to the implementation (a pile of pass

          options vs. uniquely named passes).<br>

        </div>

        <div><br>

        </div>

        <div>Here's another motivating bug where early transform to

          select prevents optimization:</div>

        <div><a href="https://bugs.llvm.org/show_bug.cgi?id=36760"

            moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=36760</a><br>

        </div>

        <div><br>

        </div>

        <div>Is that case affected by this patch?<br>

        </div>

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Aug 14, 2018 at 11:17 AM, John

          Brawn via llvm-dev <span dir="ltr"><<a

              href="mailto:llvm-dev@lists.llvm.org" target="_blank"

              moz-do-not-send="true">llvm-dev@lists.llvm.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Summary<br>

            =======<br>

            <br>

            I'm planning on adjusting SimplifyCFG so that it doesn't

            turn two-entry phi<br>

            nodes into selects until later in the pass pipeline, to give

            passes which can<br>

            understand phis but not selects more opportunity to

            optimize. The thing I'm<br>

            trying to do which made me think of doing this is described

            below, but from the<br>

            benchmarking I've done it looks like this is overall a good

            idea regardless of<br>

            if I manage to get that done or not.<br>

            <br>

            Motivation<br>

            ==========<br>

            <br>

            My goal is to get clang to optimize some code containing a

            call to<br>

            std::min_element which is dereferenced, so something like:<br>

            <br>

              float min_element_example(float *data, int size)<br>

              {<br>

                return *std::min_element(data, data+size);<br>

              }<br>

            <br>

            which, after inlining a specialization, looks like:<br>

            <br>

              float min_element_example_inlined(<wbr>float *first, float

            *last)<br>

              {<br>

                for (float *p = first; p != last; ++p)<br>

                {<br>

                  if (*p < *first)<br>

                    first = p;<br>

                }<br>

                return *first;<br>

              }<br>

            <br>

            There are two loads in the loop, *p and *first, but actually

            the load *p can be<br>

            eliminated by using either the previous load *p or the

            previous *first,<br>

            depending on if the if-condition was taken or not. However

            the<br>

            "if (*p < *first) first = p" gets turned by simplifycfg

            into a select and this<br>

            makes optimizing this a lot harder because you no longer

            have distinct paths<br>

            through the CFG.<br>

            <br>

            I have some ideas on how to do the optimization (see my

            previous RFC "Making GVN<br>

            able to visit the same block more than once" posted in

            April, though I've<br>

            decided that the specific idea presented there isn't the

            right way to do it),<br>

            but I think the first step is to make sure we don't have a

            select when we try<br>

            to optimise this.<br>

            <br>

            Approach<br>

            ========<br>

            <br>

            I've posted a patch to <a

              href="https://reviews.llvm.org/D50723" rel="noreferrer"

              target="_blank" moz-do-not-send="true">https://reviews.llvm.org/<wbr>D50723</a>

            showing what I'm<br>

            intending to do. An extra parameter is added to SimplifyCFG

            to control whether<br>

            two-entry phi nodes are converted into select, and this is

            set to false in all<br>

            instances before the end of module simplification. At the

            end of module<br>

            simplification we do SimplifyCFG, then Instcombine to

            optimise the selects that<br>

            are introduced, then EarlyCSE to eliminate common

            subexpressions introduced by<br>

            instcombine.<br>

            <br>

            Benchmark Results<br>

            =================<br>

            <br>

            These are performance differences reported by LNT when

            running llvm-test-suite,<br>

            spec2000, and spec2006 at -O3 with and without the patch

            linked above (using<br>

            trunk llvm from a week or so ago).<br>

            <br>

            AArch64 results on ARM Cortex-A72:<br>

            <br>

            Performance Regressions - execution_time                   

                      Change<br>

            SingleSource/Benchmarks/<wbr>Shootout/Shootout-ary3         

                           9.48%<br>

            MultiSource/Benchmarks/TSVC/<wbr>Packing-flt/Packing-flt   

                            3.79%<br>

            SingleSource/Benchmarks/<wbr>CoyoteBench/huffbench         

                            1.40%<br>

            <br>

            Performance Improvements - execution_time                   

                     Change<br>

            MultiSource/Benchmarks/TSVC/<wbr>Searching-dbl/Searching-dbl 

                        -23.74%<br>

            External/SPEC/CINT2000/256.<wbr>bzip2/256.bzip2             

                          -9.82%<br>

            MultiSource/Benchmarks/TSVC/<wbr>Searching-flt/Searching-flt 

                         -9.57%<br>

            MultiSource/Benchmarks/TSVC/<wbr>Equivalencing-flt/<wbr>Equivalencing-flt 

                 -4.38%<br>

            MultiSource/Benchmarks/TSVC/<wbr>LinearDependence-flt/<wbr>LinearDependence-flt

            -3.94%<br>

            MultiSource/Benchmarks/TSVC/<wbr>Packing-dbl/Packing-dbl   

                           -3.44%<br>

            External/SPEC/CFP2006/453.<wbr>povray/453.povray           

                           -2.50%<br>

            SingleSource/Benchmarks/Adobe-<wbr>C++/stepanov_vector     

                           -1.49%<br>

            <br>

            X86_64 results on Intel Xeon E5-2690:<br>

            <br>

            Performance Regressions - execution_time           Change<br>

            MultiSource/Benchmarks/<wbr>Ptrdist/yacr2/yacr2         

            5.62%<br>

            <br>

            Performance Improvements - execution_time          Change<br>

            SingleSource/Benchmarks/Misc-<wbr>C++/Large/sphereflake

            -4.43%<br>

            External/SPEC/CINT2006/456.<wbr>hmmer/456.hmmer       

             -2.50%<br>

            External/SPEC/CINT2006/464.<wbr>h264ref/464.h264ref   

             -1.60%<br>

            MultiSource/Benchmarks/nbench/<wbr>nbench             

             -1.19%<br>

            SingleSource/Benchmarks/Adobe-<wbr>C++/functionobjects 

            -1.07%<br>

            <br>

            I had a brief look at the regressions and they all look to

            be caused by<br>

            getting bad luck with branch mispredictions: I looked into

            the Shootout-ary3 and<br>

            yacr2 cases and in both the hot code path was the same with

            and without the<br>

            patch, but with more mispredictions probably caused by

            changes elsewhere.<br>

            <br>

            John<br>

            <br>

            ______________________________<wbr>_________________<br>

            LLVM Developers mailing list<br>

            <a href="mailto:llvm-dev@lists.llvm.org"

              moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

            <a

              href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

              rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>