<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 01/11/2015 10:17 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTV0mZ7_WoEF-3xqyvo1AU7xhppw8hpPXUQ07TRkmi06hg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 8, 2015 at 3:33 PM,
Philip Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span
class=""><br>
On 01/07/2015 05:33 PM, Chandler Carruth wrote:<br>
</span><span class="">
<blockquote class="gmail_quote" style="margin:0px 0px
0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">How
does this compare with classical approaches of loop
peeling, partitioning, fission, or whatever you might
call it?<br>
</blockquote>
</span>
I'm still developing a good sense for this, but let me lay
out some observations. Fair warning, this is going to be
long and rambling.<br>
<br>
Let's start with a toy example:<br>
while(c) {<br>
x = this->x;<br>
y = this->y;<br>
if (x == y) {<br>
rare();<br>
}<br>
}<br>
<br>
</blockquote>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">If
we could tell x and y were loop invariant, we could
unswitch this loop. However, the rare call clobbers our
view of memory, so LICM fails, and thus unswitch fails.<br>
<br>
</blockquote>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We'd
like to apply PRE here and push the reload into the loop
preheader and the rare case. This currently fails for two
reasons: 1) We don't allow code duplication during PRE, </blockquote>
<div><br>
</div>
<div>?????</div>
<div>If we don't, we aren't doing real PRE. So i'm not
entirely sure what you mean here.</div>
</div>
</div>
</div>
</blockquote>
As I think you comment elsewhere in your response, we currently do
not duplicate loads; we only move them. The current structure is
based around finding a legal point which doesn't introduce a load
along any path where one didn't exist previously. If we have a path
which has two copies of the load, we try to find a place to move one
of them such that all paths still have the available load and we've
removed the extra load along the one path. <br>
<br>
(Note: The above explanation is rough and does not parallel how the
code is organized.)<br>
<br>
<br>
<blockquote
cite="mid:CAF4BwTV0mZ7_WoEF-3xqyvo1AU7xhppw8hpPXUQ07TRkmi06hg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">and
2) the canonical loop-simplify form of having a single
latch means that PRE doesn't actually see the rare block
at all, it sees the preheader and the join point after the
if block.</blockquote>
<div><br>
</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>
I think both of the previous are fixable:<br>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>GCC's PRE already does the above.</div>
<div>It doesn't do profile guided duplication.</div>
<div>We aren't doing anything special with these blocks.</div>
<div><br>
here is the code I used to test:<br>
<br>
</div>
<div>struct a{</div>
<div><span class="" style="white-space:pre"> </span>int x;</div>
<div><span class="" style="white-space:pre"> </span>int y;</div>
<div>};</div>
<div>extern void rare();</div>
<div>int mai(int c, struct a *this)</div>
<div>{</div>
<div><span class="" style="white-space:pre"> </span>int d =
0;</div>
<div><span class="" style="white-space:pre"> </span>while(c)
{</div>
<div><span class="" style="white-space:pre"> </span>int x =
this->x;</div>
<div><span class="" style="white-space:pre"> </span>int y =
this->y;</div>
<div><span class="" style="white-space:pre"> </span>d += x
+ y;</div>
<div><span class="" style="white-space:pre"> </span>if (x
== y) {</div>
<div><span class="" style="white-space:pre"> </span>rare();</div>
<div><span class="" style="white-space:pre"> </span>}</div>
<div><span class="" style="white-space:pre"> </span>}</div>
<div><span class="" style="white-space:pre"> </span>return
d;</div>
<div>} </div>
<div><br>
</div>
<div><br>
</div>
<div>It will do exactly what you expect, it is transformed
into:<br>
</div>
<div><br>
</div>
<div>
<div>struct a{</div>
<div><span class="" style="white-space:pre"> </span>int
x;</div>
<div><span class="" style="white-space:pre"> </span>int
y;</div>
<div>};</div>
<div>extern void rare();</div>
<div>int mai(int c, struct a *this)</div>
<div>{</div>
<div><span class="" style="white-space:pre"> </span>int d
= 0;</div>
<div> int pretemp1 = this->x</div>
<div> int pretemp2 = this->y</div>
<div> </div>
<div><span class="" style="white-space:pre"> </span>while(c)
{</div>
<div> pretemp1phi = phi(rare block
pretemp1, preheader pretemp1).</div>
<div> pretemp2phi = phi(rare block
pretemp2, preheader pretemp2)</div>
<div><br>
</div>
<div><span class="" style="white-space:pre"> </span>d +=
pretemp1phi + pretemp2phi</div>
<div><span class="" style="white-space:pre"> </span>if (x
== y) {</div>
<div><span class="" style="white-space:pre"> </span>rare();</div>
<div> pretemp1 = this->x;</div>
<div> pretemp2 = this->y;</div>
<div><br>
</div>
<div><span class="" style="white-space:pre"> </span>}</div>
<div><span class="" style="white-space:pre"> </span>}</div>
<div><span class="" style="white-space:pre"> </span>return
d;</div>
<div>} </div>
</div>
<div>I don't see why profile guided duplication is necessary
here. This is a basic load PRE case. It is handled by the
first version of GVN-based Load PRE I wrote for GCC. It
is always a win.</div>
</div>
</div>
</div>
</blockquote>
Well, to be pedantic, it is not always profitable. If this loop
runs exactly one iteration, this is both a code size pessimization
and possibly a runtime execution pessimization. While in this
particular case, you probably don't care, I'd worry that an
algorithm which gets this might also have other pessimization cases
which are more worrying. <br>
<br>
However, I think I've actually come to the same underlying
conclusion. This is a case that our PRE algorithm should handle
without needing to resort to profiling data. <br>
<br>
I have to admit that I am unfamiliar with the GCC implementation.
Could you outline the algorithm and it's important concepts? (links
are fine too)<br>
<br>
<blockquote
cite="mid:CAF4BwTV0mZ7_WoEF-3xqyvo1AU7xhppw8hpPXUQ07TRkmi06hg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Looking at what LLVM does, the failing on the PRE side
is that our PRE/GVN models are not strong enough to handle
this. I'm not sure at all why we think anything else is
necessary. It's certainly not requiring special code
duplication heuristics, etc.</div>
<div><br>
</div>
<div>So either you are thinking of a case that is different
from the above, or I am seriously confused :)</div>
</div>
</div>
</div>
</blockquote>
Well, I think we've gotten to the same point here. An improved PRE
implementation should handle most of the cases I've actually
encountered. Having said that, I'm not yet willing to say that the
profile guided loop splitting isn't *also* worth exploring. From a
practical perspective, a lot of our existing optimizations are
structured on loops. Letting those kick in without falling back to
(expensive) GVN might be worthwhile from a practical perspective.
This is as much a pass ordering problem as anything else. <br>
<br>
p.s. I've started to post patches which improve the results given by
our current PRE to the llvm-commits list. So far, it's just simple
stuff, but that's the direction I'm current working in. I'm going
to defer looking into the loop nest splitting until I've pushed PRE
as far as I easily can. <br>
<br>
Philip<br>
<br>
</body>
</html>