<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 12, 2015 at 12:48 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  
  <div bgcolor="#FFFFFF" text="#000000"><span class="">

    <br>

    <div>On 01/11/2015 10:17 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Thu, Jan 8, 2015 at 3:33 PM,

            Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><br>

                On 01/07/2015 05:33 PM, Chandler Carruth wrote:<br>

              </span><span>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">How

                  does this compare with classical approaches of loop

                  peeling, partitioning, fission, or whatever you might

                  call it?<br>

                </blockquote>

              </span>

              I'm still developing a good sense for this, but let me lay

              out some observations.  Fair warning, this is going to be

              long and rambling.<br>

              <br>

              Let's start with a toy example:<br>

              while(c) {<br>

                x = this->x;<br>

                y = this->y;<br>

                if (x == y) {<br>

                  rare();<br>

                }<br>

              }<br>

              <br>

            </blockquote>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">If

              we could tell x and y were loop invariant, we could

              unswitch this loop.  However, the rare call clobbers our

              view of memory, so LICM fails, and thus unswitch fails.<br>

              <br>

            </blockquote>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We'd

              like to apply PRE here and push the reload into the loop

              preheader and the rare case.  This currently fails for two

              reasons: 1) We don't allow code duplication during PRE, </blockquote>

            <div><br>

            </div>

            <div>?????</div>

            <div>If we don't, we aren't doing real PRE. So i'm not

              entirely sure what you mean here.</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    As I think you comment elsewhere in your response, we currently do

    not duplicate loads; we only move them.  The current structure is

    based around finding a legal point which doesn't introduce a load

    along any path where one didn't exist previously.  If we have a path

    which has two copies of the load, we try to find a place to move one

    of them such that all paths still have the available load and we've

    removed the extra load along the one path.  <br>

    <br>

    (Note: The above explanation is rough and does not parallel how the

    code is organized.)<div><div class="h5"><br>

    <br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">and

              2) the canonical loop-simplify form of having a single

              latch means that PRE doesn't actually see the rare block

              at all, it sees the preheader and the join point after the

              if block.</blockquote>

            <div><br>

            </div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

              I think both of the previous are fixable:<br>

            </blockquote>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>GCC's PRE already does the above.</div>

            <div>It doesn't do profile guided duplication.</div>

            <div>We aren't doing anything special with these blocks.</div>

            <div><br>

              here is the code I used to test:<br>

              <br>

            </div>

            <div>struct a{</div>

            <div><span style="white-space:pre-wrap"> </span>int x;</div>

            <div><span style="white-space:pre-wrap"> </span>int y;</div>

            <div>};</div>

            <div>extern void rare();</div>

            <div>int mai(int c, struct a *this)</div>

            <div>{</div>

            <div><span style="white-space:pre-wrap"> </span>int d =

              0;</div>

            <div><span style="white-space:pre-wrap"> </span>while(c)

              {</div>

            <div><span style="white-space:pre-wrap"> </span>int x =

              this->x;</div>

            <div><span style="white-space:pre-wrap"> </span>int y =

              this->y;</div>

            <div><span style="white-space:pre-wrap"> </span>d += x

              + y;</div>

            <div><span style="white-space:pre-wrap"> </span>if (x

              == y) {</div>

            <div><span style="white-space:pre-wrap"> </span>rare();</div>

            <div><span style="white-space:pre-wrap"> </span>}</div>

            <div><span style="white-space:pre-wrap"> </span>}</div>

            <div><span style="white-space:pre-wrap"> </span>return

              d;</div>

            <div>} </div>

            <div><br>

            </div>

            <div><br>

            </div>

            <div>It will do exactly what you expect, it is transformed

              into:<br>

            </div>

            <div><br>

            </div>

            <div>

              <div>struct a{</div>

              <div><span style="white-space:pre-wrap"> </span>int

                x;</div>

              <div><span style="white-space:pre-wrap"> </span>int

                y;</div>

              <div>};</div>

              <div>extern void rare();</div>

              <div>int mai(int c, struct a *this)</div>

              <div>{</div>

              <div><span style="white-space:pre-wrap"> </span>int d

                = 0;</div>

              <div>        int pretemp1 = this->x</div>

              <div>        int pretemp2 = this->y</div>

              <div>     </div>

              <div><span style="white-space:pre-wrap"> </span>while(c)

                {</div>

              <div>                pretemp1phi = phi(rare block

                pretemp1, preheader pretemp1).</div>

              <div>                pretemp2phi = phi(rare block

                pretemp2, preheader pretemp2)</div>

              <div><br>

              </div>

              <div><span style="white-space:pre-wrap"> </span>d +=

                pretemp1phi + pretemp2phi</div>

              <div><span style="white-space:pre-wrap"> </span>if (x

                == y) {</div>

              <div><span style="white-space:pre-wrap"> </span>rare();</div>

              <div>                        pretemp1 = this->x;</div>

              <div>                        pretemp2 = this->y;</div>

              <div><br>

              </div>

              <div><span style="white-space:pre-wrap"> </span>}</div>

              <div><span style="white-space:pre-wrap"> </span>}</div>

              <div><span style="white-space:pre-wrap"> </span>return

                d;</div>

              <div>} </div>

            </div>

            <div>I don't see why profile guided duplication is necessary

              here. This is a basic load PRE case.  It is handled by the

              first version of GVN-based Load PRE I wrote for GCC.  It

              is always a win.</div>

          </div>

        </div>

      </div>

    </blockquote></div></div>

    Well, to be pedantic, it is not always profitable.  If this loop

    runs exactly one iteration, this is both a code size pessimization

    and possibly a runtime execution pessimization. </div></blockquote><div><br></div><div>Sure, but it is a lifetime optimal PRE case :)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> While in this

    particular case, you probably don't care, I'd worry that an

    algorithm which gets this might also have other pessimization cases

    which are more worrying.  <br></div></blockquote><div><br></div><div>Then you shouldn't use any basic PRE at all.</div><div>It will always make these decisions.  Profile guided PRE implementations are more complex.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    However, I think I've actually come to the same underlying

    conclusion.  This is a case that our PRE algorithm should handle

    without needing to resort to profiling data.  <br>

    <br>

    I have to admit that I am unfamiliar with the GCC implementation. 

    Could you outline the algorithm and it's important concepts?  (links

    are fine too)</div></blockquote><div><br></div><div>It uses an SCC based value numbering implementation(<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.1723">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.1723</a>), builds a value expressions, then performs GVNPRE on the value expressions (<a href="https://www.cs.purdue.edu/homes/hosking/papers/cc04.pdf">https://www.cs.purdue.edu/homes/hosking/papers/cc04.pdf</a>)</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class=""><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Looking at what LLVM does, the failing on the PRE side

              is that our PRE/GVN models are not strong enough to handle

              this. I'm not sure at all why we think anything else is

              necessary.  It's certainly not requiring special code

              duplication heuristics, etc.</div>

            <div><br>

            </div>

            <div>So either you are thinking of a case that is different

              from the above, or I am seriously confused :)</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    Well, I think we've gotten to the same point here.  An improved PRE

    implementation should handle most of the cases I've actually

    encountered.  Having said that, I'm not yet willing to say that the

    profile guided loop splitting isn't *also* worth exploring.  From a

    practical perspective, a lot of our existing optimizations are

    structured on loops.  Letting those kick in without falling back to

    (expensive) GVN might be worthwhile from a practical perspective. 

    This is as much a pass ordering problem as anything else.  <br>

    <br>

    p.s. I've started to post patches which improve the results given by

    our current PRE to the llvm-commits list.  So far, it's just simple

    stuff, but that's the direction I'm current working in.  I'm going

    to defer looking into the loop nest splitting until I've pushed PRE

    as far as I easily can.  <br></div></blockquote><div><br></div><div>I truly think you would be better off coming up with a real PRE implementation than trying to push the current one, but it's your time :)</div><div>The current PRE is an iterative PRE with a lot of limitations.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    Philip<br>

    <br>

  </div>


</blockquote></div><br></div></div>