<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 12, 2015 at 1:08 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  

    

  

  <div bgcolor="#FFFFFF" text="#000000"><span class="">

    <br>

    <div>On 01/11/2015 10:35 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div dir="ltr">

                <div class="gmail_extra">

                  <div class="gmail_quote">

                    <div><br>

                    </div>

                    <div>Looking at what LLVM does, the failing on the

                      PRE side is that our PRE/GVN models are not strong

                      enough to handle this. I'm not sure at all why we

                      think anything else is necessary.  </div>

                  </div>

                </div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>By this i mean we don't actually perform real PRE for

              loads. It's a known deficiency and one that does not

              require any duplicate heuristics to solve.</div>

            <div>Right now we perform some availability based removal

              that happens be willing to do an insertion here or there

              to try to move a load in some cases (it will only do a

              single insertion).  It's not also based on real

              availability, but on finding nearest dependencies of loads

              and squinting at them (which is not even really the same

              as finding where a load stops being available) funny.</div>

            <div><br>

            </div>

            <div>It's also not real PRE, just some code that is willing

              to insert one load to move a load.</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    Er, I think we have a terminology problem here.  From what I can

    gather, our current implementation *is* an implementation of classic

    PRE.</div></blockquote><div><br></div><div>Scalar yes, load PRE, no.</div><div><br></div><div>What load PRE does is not the same as availability.</div><div>It finds the nearest dependencies (be they load/load, store/load, or clobber) for a load walking backwards.</div><div>That is not availability.</div><div><br></div><div>Availability is a forward problem, not a backwards one :)</div><div><br></div><div>It tries to transform the data it gets from memdep into something like availability info, but you *will not get the same results* in some cases.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">  It's more or less directly organized around the classic

    "available" and "anticipated" concepts from the literature.  The

    *implementation* has some limitations, but the conceptual framework

    is there*. </div></blockquote><div><br></div><div>I agree the anticipation calculation is somewhat there, but even that is not a standard PRE one.</div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> One of the key invariants in classic PRE is that you do

    not change program behaviour on any path. </div></blockquote><div>Well, it's more "do not increase the number of computations on any path", but even that gets violated in most standard PRE's because they do LICM out of loops.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"> As a result, moving (or

    at most duplicating) loads is pretty much all that happens.<br></div></blockquote><div><br></div><div>Sure.</div><div> <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    * It took me a few hours of squinting at the code to recognize that

    weird backwards walk as being an implementation of "anctipated", but

    it is.  The "available" part comes from MemoryDependenceAnalysis and

    the forward dataflow algorithm based on those inputs.<br></div></blockquote><div><br></div><div>memorydependencyanalysis is a backwards walker, it does not compute availability of a load.</div><div>It computes the nearest dependencies.  We then try to take this and turn it into the result of a forward availability problem.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    If you want to start inserting loads on paths which didn't

    previously have them - for example the loop preheader in your

    example in the previous email - this is not classic PRE as I'm

    familiar with it.  It can be very profitable and worthwhile, but is

    not classic PRE.   I'm not sure what this variant is known as in the

    literature.  <br></div></blockquote><div><br></div><div>This is definitely a standard SSA based PRE algorithm. Every single one i am familiar with will all move code out of loops like this, be it scalar or loads, regardless of the fact that it does in fact, increase the number of possible computations by one if the loop does not execute.  </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>

    (If I'm being overly pedantic here, or just flat out wrong, please

    say so.  I'm still exploring related material and am trying to

    categorize things in my own head.)<br>

    <br></div></blockquote><div><br></div><div>I do not believe you will find an SSA based  PRE that will not do the above optimization by default.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    I think we actually in reasonable agreement that we do need to move

    beyond 'classic PRE' as I called it above.  If you have specific

    suggestions, positive or negative examples, or other thoughts to

    share, I'd very much value the input.  </div></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class="">

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>If you want this case to work, what you need is a real

              Load PRE implementation, not one based on simple memdep

              based load moving.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Trying to hack this one up with profile guided code

              duplication, etc, to get it to kinda sorta work seems to

              me to be a bad idea.</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    Not sure I really agree here, but I'm open to pushing as far as we

    can with static heuristics first.  I suspect that's likely to be

    more stable w.r.t. optimization effect and applicable to more

    cases.  <br></div></blockquote><div><br></div><div>So, LLVM has been through three PRE implementations.</div><div>It had an implementation of e-path PRE, an implementation of GVNPRE (that only worked on scalar code i believe) a long long time ago, and then the current one.</div><div><br></div><div>The current one is used because it's fast and catches a lot of cases.</div><div>The others were eliminated because they were essentially experimental research-quality implementations, and were slow :)</div><div><br></div><div>Because the current PRE is not value based, and it really has no idea about memory, you will run into limitations on loads pretty quickly (particularly loads indexed by things like induction variables).</div><div>It already tries to work around some of them by doing some actual value analysis of the load (and doing forwarding), but this only works on basic cases.</div><div><br></div><div><div>GVNPRE can be implemented and made a lot faster than it was (In GCC, it's actually one of the fastest high level optimization passes) with very careful engineering, and would take care of all of these cases.</div></div><div><br></div><div>I'm also fairly positive you could take any good anticipation based PRE algorithm you can find, and i could make it value based and still fast.</div><div><br></div><div>But in the end, pushing the current one seems like a bad idea to me because it wasn't designed for what you are trying to do - get good PRE results, and do good load PRE, and we know of better ways to do that.</div><div><br></div><div>All pushing it will do is push it out of the sweet spot it was designed for.</div><div><br></div><div>If you want to get into profille-guided placement or speculative PRE, it involves solving min-cut problems on very large graphs in most cases.</div><div><br></div><div>The only implementation i'm aware of that is relatively sane is <a href="http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf">http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf</a></div><div>and</div><div><a href="https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1">https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1</a><br></div><div><br></div><div>(This does not require solving min-cut problems)</div><div><br></div><div><br></div><div><br></div><div>In the end though, you are doing the work, so you get to decide what you want to do :)</div><div><br></div></div></div></div>