<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 01/12/2015 01:11 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra">On Mon, Jan 12, 2015 at 1:08 PM, Philip

          Reames <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

          wrote:<br>

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>

                  <div>On 01/11/2015 10:35 PM, Daniel Berlin wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                            <div dir="ltr">

                              <div class="gmail_extra">

                                <div class="gmail_quote">

                                  <div>Looking at what LLVM does, the

                                    failing on the PRE side is that our

                                    PRE/GVN models are not strong enough

                                    to handle this. I'm not sure at all

                                    why we think anything else is

                                    necessary.  </div>

                                </div>

                              </div>

                            </div>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>By this i mean we don't actually perform

                            real PRE for loads. It's a known deficiency

                            and one that does not require any duplicate

                            heuristics to solve.</div>

                          <div>Right now we perform some availability

                            based removal that happens be willing to do

                            an insertion here or there to try to move a

                            load in some cases (it will only do a single

                            insertion).  It's not also based on real

                            availability, but on finding nearest

                            dependencies of loads and squinting at them

                            (which is not even really the same as

                            finding where a load stops being available)

                            funny.</div>

                          <div><br>

                          </div>

                          <div>It's also not real PRE, just some code

                            that is willing to insert one load to move a

                            load.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </span> Er, I think we have a terminology problem here. 

                From what I can gather, our current implementation *is*

                an implementation of classic PRE.</div>

            </blockquote>

            <div><br>

            </div>

            <div>Scalar yes, load PRE, no.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Can you definite specifically what you mean by "load pre"?  To me,

    load pre is simply PRE on loads.  It sounds like you're saying

    something slightly different.  <br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>What load PRE does is not the same as availability.</div>

            <div>It finds the nearest dependencies (be they load/load,

              store/load, or clobber) for a load walking backwards.</div>

            <div>That is not availability.</div>

            <div><br>

            </div>

            <div>Availability is a forward problem, not a backwards one

              :)</div>

            <div><br>

            </div>

            <div>It tries to transform the data it gets from memdep into

              something like availability info, but you *will not get

              the same results* in some cases.</div>

          </div>

        </div>

      </div>

    </blockquote>

    I would argue that the current implementation is calculating

    availability.  It's simply doing in a conservative manner and thus

    gives up some quality in the results achieved.  The notion of

    availability (i.e. "there's a load available along all paths that

    reach this basic block") is the same.<br>

    <br>

    Just to make sure we're on the same page, the reason why the

    backwards calculation is potentially less useful than the forward

    one is that we unconditionally stop when encountering a def.  There

    might have been an earlier def along that path and thus possible a

    cheaper location to put a load.  Is that the only case?  Or is there

    something else I haven't seen yet?<br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote"><br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> * It took me a few

                hours of squinting at the code to recognize that weird

                backwards walk as being an implementation of

                "anctipated", but it is.  The "available" part comes

                from MemoryDependenceAnalysis and the forward dataflow

                algorithm based on those inputs.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>memorydependencyanalysis is a backwards walker, it does

              not compute availability of a load.</div>

            <div>It computes the nearest dependencies.  We then try to

              take this and turn it into the result of a forward

              availability problem.</div>

          </div>

        </div>

      </div>

    </blockquote>

    (See above comment)<br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> <br>

                If you want to start inserting loads on paths which

                didn't previously have them - for example the loop

                preheader in your example in the previous email - this

                is not classic PRE as I'm familiar with it.  It can be

                very profitable and worthwhile, but is not classic

                PRE.   I'm not sure what this variant is known as in the

                literature.  <br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>This is definitely a standard SSA based PRE algorithm.

              Every single one i am familiar with will all move code out

              of loops like this, be it scalar or loads, regardless of

              the fact that it does in fact, increase the number of

              possible computations by one if the loop does not

              execute.  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    I'd written a response on the technicalities here, then realized it

    really didn't matter.  From a practical perspective, we definitely

    want our PRE implementation to pull loads out of loops when legal,

    even if that does introduce a load along a path it didn't exist

    previously.  (Note, there is that pesky legality issue.)  <br>

    <br>

    I've got a patch up for discussion right now in exactly this vein:

    <a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D7061">http://reviews.llvm.org/D7061</a><br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote"><br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"><span class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div>If you want this case to work, what you

                            need is a real Load PRE implementation, not

                            one based on simple memdep based load

                            moving.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <div>Trying to hack this one up with profile

                            guided code duplication, etc, to get it to

                            kinda sorta work seems to me to be a bad

                            idea.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </span> Not sure I really agree here, but I'm open to

                pushing as far as we can with static heuristics first. 

                I suspect that's likely to be more stable w.r.t.

                optimization effect and applicable to more cases.  <br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>So, LLVM has been through three PRE implementations.</div>

            <div>It had an implementation of e-path PRE, an

              implementation of GVNPRE (that only worked on scalar code

              i believe) a long long time ago, and then the current one.</div>

            <div><br>

            </div>

            <div>The current one is used because it's fast and catches a

              lot of cases.</div>

            <div>The others were eliminated because they were

              essentially experimental research-quality implementations,

              and were slow :)</div>

          </div>

        </div>

      </div>

    </blockquote>

    After looking more at both the current implementation and the

    literature, I've largely accepted we need a more general PRE

    algorithm.  Unfortunately, I really don't have the background to

    implement such a thing today.  I can gain it, but that's an

    investment of time I'm not sure I can justify right now.  I'm slowly

    ramping up, but it's taking a while.  <br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Because the current PRE is not value based, and it

              really has no idea about memory, you will run into

              limitations on loads pretty quickly (particularly loads

              indexed by things like induction variables).</div>

            <div>It already tries to work around some of them by doing

              some actual value analysis of the load (and doing

              forwarding), but this only works on basic cases.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Ok, again we've run into a terminology problem.  I don't understand

    why you mean by "is not value based" and "has no idea about

    memory".  <br>

    <br>

    Given that we're both in the Mountain View area, would you be

    interested in just meeting for lunch or a beer?  Trying to go back

    and forth in writing is going to be far slower than talking in

    person.  <br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>

              <div>GVNPRE can be implemented and made a lot faster than

                it was (In GCC, it's actually one of the fastest high

                level optimization passes) with very careful

                engineering, and would take care of all of these cases.</div>

            </div>

            <div><br>

            </div>

            <div>I'm also fairly positive you could take any good

              anticipation based PRE algorithm you can find, and i could

              make it value based and still fast.</div>

            <div><br>

            </div>

            <div>But in the end, pushing the current one seems like a

              bad idea to me because it wasn't designed for what you are

              trying to do - get good PRE results, and do good load PRE,

              and we know of better ways to do that.</div>

            <div><br>

            </div>

            <div>All pushing it will do is push it out of the sweet spot

              it was designed for.</div>

          </div>

        </div>

      </div>

    </blockquote>

    (As I said above, I've largely convinced myself you're right.  I do

    want to extend the current algorithm in a few, hopefully narrow,

    ways, but in the mid to long run, we'll need something better.)<br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>If you want to get into profille-guided placement or

              speculative PRE, it involves solving min-cut problems on

              very large graphs in most cases.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Yeah, I'd come to this conclusion myself.  <br>

    <br>

    Now, purely for the sake of argument, what's wrong with that?  Given

    you can use max flow to solve min-cut in O(V*E^2), that doesn't seem

    too bad.  I know we generally try to avoid extra linear growth, but

    it seems like you could a) reduce the graph substantially without

    much work, and b) cap the region you look at.   <br>

    <br>

    Here's one possible approach.  Don't take this too seriously, I'm

    just thinking through the implications.<br>

    <br>

    Start by computing forward availability for the location in

    question.  (In particular, I'm assuming availability starts at the

    earliest point on a path a load would be legal, not the first place

    it occurs.  This isn't(?) quite the standard definition.)<br>

    <br>

    Pose a weighted min-cut problem on this subset of the CFG identified

    as being fully available.  The source vertices are the nodes with no

    available predecessors, the sink vertices are the *uses* of the

    current loads, and the weights are profile weights + some factor to

    represent code size.<br>

    <br>

    Before running mincut, summarize any loop in the original CFG which

    is fully available (and thus completely in the refined graph) with a

    single node.  We statically know that placing the load in the loops

    preheader is at least as good as any other placement in the loop.  <br>

    <br>

    Before running mincut, reduce weights to the minimum of all weights

    in dominating and "obviously" post dominating nodes.  This is just

    to reduce the amount of work path finding might do in a subgraph. 

    It's not clear this would actually be worthwhile.  (If we had a post

    dom tree, we could remove the "obviously" part of that.)<br>

    <br>

    To cap the region, we could do something like only include the

    subgraph contained by the parent of the inner most loop which

    contains the loads from the location we're interested in.  We'd then

    have to run this iteratively (which is potentially faster due to

    loop summarization above), but we should reach the same final

    result...<br>

    <br>

    (The starting assumption in the above is that we already have both

    dom tree information and loop information.  We can exploit that in

    the max flow problem.)<br>

    <br>

    There might also be room for a "cheap pre" and an "expensive pre"

    here.  Particularly if the "cheap-pre" got things out of loops and

    the "expensive-pre" exploited the loop summarization trick...  Or

    vice versa possibly, getting loads out of loops is where it's worth

    spending time, so maybe we do an "expensive pre" only on loops and

    the "cheap" version outside of inner loops?  <br>

    <br>

    Again, this feels like a topic where a) I need to read a lot more

    papers, and b) it would be good to talk in person.  <br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>The only implementation i'm aware of that is relatively

              sane is <a moz-do-not-send="true"

href="http://webdocs.cs.ualberta.ca/%7Eamaral/cascon/CDP05/slides/CDP05-horspool.pdf">http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf</a></div>

            <div>and</div>

            <div><a moz-do-not-send="true"

href="https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1">https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1</a><br>

            </div>

            <div><br>

            </div>

            <div>(This does not require solving min-cut problems)</div>

          </div>

        </div>

      </div>

    </blockquote>

    I haven't gotten through this yet, but I'll add it to my reading

    list.<br>

    <blockquote

cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>In the end though, you are doing the work, so you get

              to decide what you want to do :)</div>

          </div>

        </div>

      </div>

    </blockquote>

    Yes, but I need to get it accepted by reviewers too.  :)<br>

    <br>

    Philip<br>

    <br>

  </body>

</html>