[PATCH] D28129: NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.

Wed Dec 28 19:22:19 PST 2016

On 12/28/2016 2:33 PM, Daniel Berlin wrote:
>
>
> On Wed, Dec 28, 2016 at 1:18 PM, Friedman, Eli 
> <efriedma at codeaurora.org <mailto:efriedma at codeaurora.org>> wrote:
>
>     On 12/28/2016 1:03 PM, Daniel Berlin via llvm-commits wrote:
>>
>>
>>     On Wed, Dec 28, 2016 at 7:04 AM, Davide Italiano via Phabricator
>>     <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>> wrote:
>>
>>         davide accepted this revision.
>>         davide added a comment.
>>         This revision is now accepted and ready to land.
>>
>>         Sorry for the slow response, I'm out('ish) of the office
>>         these days. I took a close look at your patch.
>>
>>
>>     No worries.
>>
>>         I happen to be lucky enough to hit a case in the wild where
>>         this already matters.  The number of iteration goes down from
>>         hundreds to ~10, which makes compile time/me happier.
>>
>>
>>     yay.
>>
>>     The current code, excepting super-weird cases, should operate in
>>     O(d+3) iterations, where d is the loop connectedness of the SSA
>>     graph (not the CFG), which is the number of backedges in any
>>     path. This will change when we move to equality propagation, but
>>     for now, ...
>>     We could  calculate this number and see if we are screwing up :)
>>
>>     For most programs, the loop connectedness of the SSA graph is the
>>     same or less than the CFG.
>>
>>     However,  IIRC, we allow  dependent phis in the same block (this
>>     is not strictly SSA, since all phi nodes are supposed to be
>>     evaluated simultaneously).
>
>     I'm not sure what you're trying to say here? PHI nodes for a given
>     basic block are evaluated simultaneously. From LangRef: "For the
>     purposes of the SSA form, the use of each incoming value is deemed
>     to occur on the edge from the corresponding predecessor block to
>     the current block (but after any definition of an ‘|invoke|‘
>     instruction’s return value on the same edge)."
>
>
> I'm saying we've had mailing list arguments about this, about whether 
> there is any ordering among phi nodes in a given block.  The part you 
> quote from the langref does not actually definitively answer that 
> (again, there is no argument in theory.  In the abstract, the answer 
> is "there is no ordering, it's undefined to have phis depend in the 
> same block depend on each other")
>
> Given
> b = phi(d, e)
> a = phi(b, c)
>
> Saying "is deemed to occur on the edge of the corresponding 
> predecessor block" does not help.

Consider the following function:

void f(int a, int b, int g(int, int)) {
   while (g(a, b)) { int x = a; a = b; b = x; }
}

mem2reg produces this:

define void @f(i32 %a, i32 %b, i32 (i32, i32)* %g) #0 {
entry:
   br label %while.cond

while.cond:                                       ; preds = %while.body, 
%entry
   %a.addr.0 = phi i32 [ %a, %entry ], [ %b.addr.0, %while.body ]
   %b.addr.0 = phi i32 [ %b, %entry ], [ %a.addr.0, %while.body ]
   %call = call i32 %g(i32 %a.addr.0, i32 %b.addr.0)
   %tobool = icmp ne i32 %call, 0
   br i1 %tobool, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
   br label %while.cond

while.end:                                        ; preds = %while.cond
   ret void
}

A "phi" works in the only way which allows this IR to match the 
semantics of the C code.

If you think LangRef isn't clear, suggestions are welcome.

> Popping back up, regardless of resolution, this causes the issue i 
> mentioned above - it may require more iterations to resolve because of 
> the second case passing verification.  If we really want phi nodes to 
> be executable , and want it to take the minimum number of iterations 
> to converge NewGVN, we need to process aa before t1.
>
> Otherwise, we will process t1, get some value, *then* process aa, and 
> immediately mark t1 as needing to be reprocessed since it is a use of 
> aa.  We effectively waste an iteration because all of t1's uses have 
> are going to have the wrong value.

It looks like NewGVN creates one less congruence class if you process 
them in the "right" order.  I'm not sure there's any way to usefully 
generalize that heuristic, though; you're only saving time based on 
discovering the cycle one step faster.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161228/ccf944e1/attachment.html>