<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 28, 2016 at 7:22 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-">
<div class="gmail-m_-5492007076233974577moz-cite-prefix">On 12/28/2016 2:33 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Dec 28, 2016 at 1:18 PM,
Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><span class="gmail-m_-5492007076233974577gmail-">
<div class="gmail-m_-5492007076233974577gmail-m_-4041176510637431261moz-cite-prefix">On
12/28/2016 1:03 PM, Daniel Berlin via llvm-commits
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Dec 28, 2016 at
7:04 AM, Davide Italiano via Phabricator <span dir="ltr"><<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">davide
accepted this revision.<br>
davide added a comment.<br>
This revision is now accepted and ready to
land.<br>
<br>
Sorry for the slow response, I'm out('ish)
of the office these days. I took a close
look at your patch.<br>
</blockquote>
<div><br>
</div>
<div>No worries.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> I happen
to be lucky enough to hit a case in the wild
where this already matters. The number of
iteration goes down from hundreds to ~10,
which makes compile time/me happier.<br>
</blockquote>
<div><br>
</div>
<div>yay.</div>
<div><br>
</div>
<div>The current code, excepting super-weird
cases, should operate in O(d+3) iterations,
where d is the loop connectedness of the SSA
graph (not the CFG), which is the number of
backedges in any path. This will change when
we move to equality propagation, but for
now, ...</div>
<div>We could calculate this number and see
if we are screwing up :)<br>
<br>
</div>
<div>For most programs, the loop connectedness
of the SSA graph is the same or less than
the CFG.</div>
<div><br>
</div>
<div>However, IIRC, we allow dependent phis
in the same block (this is not strictly SSA,
since all phi nodes are supposed to be
evaluated simultaneously).</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I'm not sure what you're trying to say here?
PHI nodes for a given basic block are evaluated
simultaneously. From LangRef: "<span>For the purposes of
the SSA form, the use of each incoming value is deemed
to occur on the edge from the corresponding
predecessor block to the current block (but after any
definition of an ‘</span><code class="gmail-m_-5492007076233974577gmail-m_-4041176510637431261docutils gmail-m_-5492007076233974577gmail-m_-4041176510637431261literal"><span class="gmail-m_-5492007076233974577gmail-m_-4041176510637431261pre">invoke</span></code><span>‘
instruction’s return value on the same edge)."</span><span class="gmail-m_-5492007076233974577gmail-HOEnZb"><font color="#888888"><br>
<br>
</font></span></div>
</blockquote>
<div><br>
</div>
<div>I'm saying we've had mailing list arguments about this,
about whether there is any ordering among phi nodes in a
given block. The part you quote from the langref does not
actually definitively answer that (again, there is no
argument in theory. In the abstract, the answer is "there
is no ordering, it's undefined to have phis depend in the
same block depend on each other")</div>
<div><br>
</div>
<div>Given</div>
<div>
<div>b = phi(d, e)</div>
<div>a = phi(b, c)</div>
</div>
<div><br>
</div>
<div>Saying "is deemed to occur on the edge of the
corresponding predecessor block" does not help.</div>
</div>
</div>
</div>
</blockquote>
<br></span>
Consider the following function:<br>
<br>
void f(int a, int b, int g(int, int)) {<br>
while (g(a, b)) { int x = a; a = b; b = x; }<br>
}<br>
<br>
mem2reg produces this:<br>
<br>
define void @f(i32 %a, i32 %b, i32 (i32, i32)* %g) #0 {<br>
entry:<br>
br label %while.cond<br>
<br>
while.cond: <wbr> ; preds =
%while.body, %entry<br>
%a.addr.0 = phi i32 [ %a, %entry ], [ %b.addr.0, %while.body ]<br>
%b.addr.0 = phi i32 [ %b, %entry ], [ %a.addr.0, %while.body ]<br>
%call = call i32 %g(i32 %a.addr.0, i32 %b.addr.0)<br>
%tobool = icmp ne i32 %call, 0<br>
br i1 %tobool, label %while.body, label %while.end<br>
<br>
while.body: <wbr> ; preds =
%while.cond<br>
br label %while.cond<br>
<br>
while.end: <wbr> ; preds =
%while.cond<br>
ret void<br>
}<br>
<br>
A "phi" works in the only way which allows this IR to match the
semantics of the C code.<br></div></blockquote><div><br></div><div>I'm not sure i believe this, but i actually don't care enough to argue about it further :)<br>That said, i'm curious:<br><br class="gmail-Apple-interchange-newline">What are the expected semantics of the second case i presented?<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<br>
If you think LangRef isn't clear, suggestions are welcome.</div></blockquote><div>I would be explicit that phi nodes may depend on each other, and what the expected evaluation order actually is (if it's "as-they-appear-in-IR", say that.)</div><div>It looks something like "dependencies are allowed, any order is allowed despites dependencies".</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><span class="gmail-"><br>
<br>
<blockquote type="cite">
<div>Popping back up, regardless of resolution, this causes the
issue i mentioned above - it may require more iterations to
resolve because of the second case passing verification. If we
really want phi nodes to be executable , and want it to take the
minimum number of iterations to converge NewGVN, we need to
process aa before t1.</div>
<div><br>
</div>
<div>Otherwise, we will process t1, get some value, *then* process
aa, and immediately mark t1 as needing to be reprocessed since
it is a use of aa. We effectively waste an iteration because
all of t1's uses have are going to have the wrong value.</div>
</blockquote>
<br></span>
It looks like NewGVN creates one less congruence class if you
process them in the "right" order. </div></blockquote><div><br></div><div>Well, no. It's not about creating the congruence class, it's about making it a reverse post-order evaluation . In your example above, there is no single RPO order due to the two cycles.</div><div>In my example, any valid RPO order of the SSA graph must visit aa before t1.</div><div>This is *normally* taken care of by evaluating instructions in block order, and ordering the CFG in RPO. But in the case i gave, it is not enough. </div><div>You would have to explicitly sort the phi nodes separately, since we allow both orderings of the phi nodes in LLVM,despite only one being RPO (and only one having defs dominate uses).</div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"> I'm not sure there's any way to
usefully generalize that heuristic, though; you're only saving time
based on discovering the cycle one step faster.<span class="gmail-"><br>
<br></span></div></blockquote><div>I'm not sure what you are trying to say here.</div><div>You are saving time by not performing iterations that are unnecessary. The generalized heuristic is exactly "perform this problem by evaluating the SSA graph/CFG in RPO order". This is provable, it's the notion of a "rapid" problem.</div><div><br></div><div>If you placed my aa/t1 example in the first block of a 10000 block function, you will process 10000 blocks (1 iteration) uselessly.</div><div>If you sort it into a valid RPO order of the SSA graph, you will not.</div><div><br></div><div>This generalizes to *any* "rapid" problem.</div><div><br></div></div></div></div>