<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 01/12/2015 01:11 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra">On Mon, Jan 12, 2015 at 1:08 PM, Philip
Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class=""> <br>
<div>On 01/11/2015 10:35 PM, Daniel Berlin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>Looking at what LLVM does, the
failing on the PRE side is that our
PRE/GVN models are not strong enough
to handle this. I'm not sure at all
why we think anything else is
necessary. </div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>By this i mean we don't actually perform
real PRE for loads. It's a known deficiency
and one that does not require any duplicate
heuristics to solve.</div>
<div>Right now we perform some availability
based removal that happens be willing to do
an insertion here or there to try to move a
load in some cases (it will only do a single
insertion). It's not also based on real
availability, but on finding nearest
dependencies of loads and squinting at them
(which is not even really the same as
finding where a load stops being available)
funny.</div>
<div><br>
</div>
<div>It's also not real PRE, just some code
that is willing to insert one load to move a
load.</div>
</div>
</div>
</div>
</blockquote>
</span> Er, I think we have a terminology problem here.
From what I can gather, our current implementation *is*
an implementation of classic PRE.</div>
</blockquote>
<div><br>
</div>
<div>Scalar yes, load PRE, no.</div>
</div>
</div>
</div>
</blockquote>
Can you definite specifically what you mean by "load pre"? To me,
load pre is simply PRE on loads. It sounds like you're saying
something slightly different. <br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>What load PRE does is not the same as availability.</div>
<div>It finds the nearest dependencies (be they load/load,
store/load, or clobber) for a load walking backwards.</div>
<div>That is not availability.</div>
<div><br>
</div>
<div>Availability is a forward problem, not a backwards one
:)</div>
<div><br>
</div>
<div>It tries to transform the data it gets from memdep into
something like availability info, but you *will not get
the same results* in some cases.</div>
</div>
</div>
</div>
</blockquote>
I would argue that the current implementation is calculating
availability. It's simply doing in a conservative manner and thus
gives up some quality in the results achieved. The notion of
availability (i.e. "there's a load available along all paths that
reach this basic block") is the same.<br>
<br>
Just to make sure we're on the same page, the reason why the
backwards calculation is potentially less useful than the forward
one is that we unconditionally stop when encountering a def. There
might have been an earlier def along that path and thus possible a
cheaper location to put a load. Is that the only case? Or is there
something else I haven't seen yet?<br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> * It took me a few
hours of squinting at the code to recognize that weird
backwards walk as being an implementation of
"anctipated", but it is. The "available" part comes
from MemoryDependenceAnalysis and the forward dataflow
algorithm based on those inputs.<br>
</div>
</blockquote>
<div><br>
</div>
<div>memorydependencyanalysis is a backwards walker, it does
not compute availability of a load.</div>
<div>It computes the nearest dependencies. We then try to
take this and turn it into the result of a forward
availability problem.</div>
</div>
</div>
</div>
</blockquote>
(See above comment)<br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
If you want to start inserting loads on paths which
didn't previously have them - for example the loop
preheader in your example in the previous email - this
is not classic PRE as I'm familiar with it. It can be
very profitable and worthwhile, but is not classic
PRE. I'm not sure what this variant is known as in the
literature. <br>
</div>
</blockquote>
<div><br>
</div>
<div>This is definitely a standard SSA based PRE algorithm.
Every single one i am familiar with will all move code out
of loops like this, be it scalar or loads, regardless of
the fact that it does in fact, increase the number of
possible computations by one if the loop does not
execute. <br>
</div>
</div>
</div>
</div>
</blockquote>
I'd written a response on the technicalities here, then realized it
really didn't matter. From a practical perspective, we definitely
want our PRE implementation to pull loads out of loops when legal,
even if that does introduce a load along a path it didn't exist
previously. (Note, there is that pesky legality issue.) <br>
<br>
I've got a patch up for discussion right now in exactly this vein:
<a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D7061">http://reviews.llvm.org/D7061</a><br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class="">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>If you want this case to work, what you
need is a real Load PRE implementation, not
one based on simple memdep based load
moving.</div>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Trying to hack this one up with profile
guided code duplication, etc, to get it to
kinda sorta work seems to me to be a bad
idea.</div>
</div>
</div>
</div>
</blockquote>
</span> Not sure I really agree here, but I'm open to
pushing as far as we can with static heuristics first.
I suspect that's likely to be more stable w.r.t.
optimization effect and applicable to more cases. <br>
</div>
</blockquote>
<div><br>
</div>
<div>So, LLVM has been through three PRE implementations.</div>
<div>It had an implementation of e-path PRE, an
implementation of GVNPRE (that only worked on scalar code
i believe) a long long time ago, and then the current one.</div>
<div><br>
</div>
<div>The current one is used because it's fast and catches a
lot of cases.</div>
<div>The others were eliminated because they were
essentially experimental research-quality implementations,
and were slow :)</div>
</div>
</div>
</div>
</blockquote>
After looking more at both the current implementation and the
literature, I've largely accepted we need a more general PRE
algorithm. Unfortunately, I really don't have the background to
implement such a thing today. I can gain it, but that's an
investment of time I'm not sure I can justify right now. I'm slowly
ramping up, but it's taking a while. <br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Because the current PRE is not value based, and it
really has no idea about memory, you will run into
limitations on loads pretty quickly (particularly loads
indexed by things like induction variables).</div>
<div>It already tries to work around some of them by doing
some actual value analysis of the load (and doing
forwarding), but this only works on basic cases.</div>
</div>
</div>
</div>
</blockquote>
Ok, again we've run into a terminology problem. I don't understand
why you mean by "is not value based" and "has no idea about
memory". <br>
<br>
Given that we're both in the Mountain View area, would you be
interested in just meeting for lunch or a beer? Trying to go back
and forth in writing is going to be far slower than talking in
person. <br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>
<div>GVNPRE can be implemented and made a lot faster than
it was (In GCC, it's actually one of the fastest high
level optimization passes) with very careful
engineering, and would take care of all of these cases.</div>
</div>
<div><br>
</div>
<div>I'm also fairly positive you could take any good
anticipation based PRE algorithm you can find, and i could
make it value based and still fast.</div>
<div><br>
</div>
<div>But in the end, pushing the current one seems like a
bad idea to me because it wasn't designed for what you are
trying to do - get good PRE results, and do good load PRE,
and we know of better ways to do that.</div>
<div><br>
</div>
<div>All pushing it will do is push it out of the sweet spot
it was designed for.</div>
</div>
</div>
</div>
</blockquote>
(As I said above, I've largely convinced myself you're right. I do
want to extend the current algorithm in a few, hopefully narrow,
ways, but in the mid to long run, we'll need something better.)<br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>If you want to get into profille-guided placement or
speculative PRE, it involves solving min-cut problems on
very large graphs in most cases.</div>
</div>
</div>
</div>
</blockquote>
Yeah, I'd come to this conclusion myself. <br>
<br>
Now, purely for the sake of argument, what's wrong with that? Given
you can use max flow to solve min-cut in O(V*E^2), that doesn't seem
too bad. I know we generally try to avoid extra linear growth, but
it seems like you could a) reduce the graph substantially without
much work, and b) cap the region you look at. <br>
<br>
Here's one possible approach. Don't take this too seriously, I'm
just thinking through the implications.<br>
<br>
Start by computing forward availability for the location in
question. (In particular, I'm assuming availability starts at the
earliest point on a path a load would be legal, not the first place
it occurs. This isn't(?) quite the standard definition.)<br>
<br>
Pose a weighted min-cut problem on this subset of the CFG identified
as being fully available. The source vertices are the nodes with no
available predecessors, the sink vertices are the *uses* of the
current loads, and the weights are profile weights + some factor to
represent code size.<br>
<br>
Before running mincut, summarize any loop in the original CFG which
is fully available (and thus completely in the refined graph) with a
single node. We statically know that placing the load in the loops
preheader is at least as good as any other placement in the loop. <br>
<br>
Before running mincut, reduce weights to the minimum of all weights
in dominating and "obviously" post dominating nodes. This is just
to reduce the amount of work path finding might do in a subgraph.
It's not clear this would actually be worthwhile. (If we had a post
dom tree, we could remove the "obviously" part of that.)<br>
<br>
To cap the region, we could do something like only include the
subgraph contained by the parent of the inner most loop which
contains the loads from the location we're interested in. We'd then
have to run this iteratively (which is potentially faster due to
loop summarization above), but we should reach the same final
result...<br>
<br>
(The starting assumption in the above is that we already have both
dom tree information and loop information. We can exploit that in
the max flow problem.)<br>
<br>
There might also be room for a "cheap pre" and an "expensive pre"
here. Particularly if the "cheap-pre" got things out of loops and
the "expensive-pre" exploited the loop summarization trick... Or
vice versa possibly, getting loads out of loops is where it's worth
spending time, so maybe we do an "expensive pre" only on loops and
the "cheap" version outside of inner loops? <br>
<br>
Again, this feels like a topic where a) I need to read a lot more
papers, and b) it would be good to talk in person. <br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>The only implementation i'm aware of that is relatively
sane is <a moz-do-not-send="true"
href="http://webdocs.cs.ualberta.ca/%7Eamaral/cascon/CDP05/slides/CDP05-horspool.pdf">http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/CDP05-horspool.pdf</a></div>
<div>and</div>
<div><a moz-do-not-send="true"
href="https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1">https://dspace.library.uvic.ca/bitstream/handle/1828/292/Pereira_0336403_Thesis-Final_21Dec2007_.pdf?sequence=1</a><br>
</div>
<div><br>
</div>
<div>(This does not require solving min-cut problems)</div>
</div>
</div>
</div>
</blockquote>
I haven't gotten through this yet, but I'll add it to my reading
list.<br>
<blockquote
cite="mid:CAF4BwTWQL2Ln80TtuPzNxSBAQo8XRMUOjhvMb1SGLuJZMuhxgg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>In the end though, you are doing the work, so you get
to decide what you want to do :)</div>
</div>
</div>
</div>
</blockquote>
Yes, but I need to get it accepted by reviewers too. :)<br>
<br>
Philip<br>
<br>
</body>
</html>