[PATCH] D28129: NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.

Daniel Berlin via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 28 20:10:26 PST 2016


On Wed, Dec 28, 2016 at 8:09 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>
> On Wed, Dec 28, 2016 at 7:22 PM, Friedman, Eli <efriedma at codeaurora.org>
> wrote:
>
>> On 12/28/2016 2:33 PM, Daniel Berlin wrote:
>>
>>
>>
>> On Wed, Dec 28, 2016 at 1:18 PM, Friedman, Eli <efriedma at codeaurora.org>
>> wrote:
>>
>>> On 12/28/2016 1:03 PM, Daniel Berlin via llvm-commits wrote:
>>>
>>>
>>>
>>> On Wed, Dec 28, 2016 at 7:04 AM, Davide Italiano via Phabricator <
>>> reviews at reviews.llvm.org> wrote:
>>>
>>>> davide accepted this revision.
>>>> davide added a comment.
>>>> This revision is now accepted and ready to land.
>>>>
>>>> Sorry for the slow response, I'm out('ish) of the office these days. I
>>>> took a close look at your patch.
>>>>
>>>
>>> No worries.
>>>
>>>
>>>> I happen to be lucky enough to hit a case in the wild where this
>>>> already matters.  The number of iteration goes down from hundreds to ~10,
>>>> which makes compile time/me happier.
>>>>
>>>
>>> yay.
>>>
>>> The current code, excepting super-weird cases, should operate in O(d+3)
>>> iterations, where d is the loop connectedness of the SSA graph (not the
>>> CFG), which is the number of backedges in any path. This will change when
>>> we move to equality propagation, but for now, ...
>>> We could  calculate this number and see if we are screwing up :)
>>>
>>> For most programs, the loop connectedness of the SSA graph is the same
>>> or less than the CFG.
>>>
>>> However,  IIRC, we allow  dependent phis in the same block (this is not
>>> strictly SSA, since all phi nodes are supposed to be evaluated
>>> simultaneously).
>>>
>>>
>>> I'm not sure what you're trying to say here?  PHI nodes for a given
>>> basic block are evaluated simultaneously. From LangRef: "For the
>>> purposes of the SSA form, the use of each incoming value is deemed to occur
>>> on the edge from the corresponding predecessor block to the current block
>>> (but after any definition of an ‘invoke‘ instruction’s return value on
>>> the same edge)."
>>>
>>>
>> I'm saying we've had mailing list arguments about this, about whether
>> there is any ordering among phi nodes in a given block.  The part you quote
>> from the langref does not actually definitively answer that (again, there
>> is no argument in theory.  In the abstract, the answer is "there is no
>> ordering, it's undefined to have phis depend in the same block depend on
>> each other")
>>
>> Given
>> b = phi(d, e)
>> a = phi(b, c)
>>
>> Saying "is deemed to occur on the edge of the corresponding predecessor
>> block" does not help.
>>
>>
>> Consider the following function:
>>
>> void f(int a, int b, int g(int, int)) {
>>   while (g(a, b)) { int x = a; a = b; b = x; }
>> }
>>
>> mem2reg produces this:
>>
>> define void @f(i32 %a, i32 %b, i32 (i32, i32)* %g) #0 {
>> entry:
>>   br label %while.cond
>>
>> while.cond:                                       ; preds = %while.body,
>> %entry
>>   %a.addr.0 = phi i32 [ %a, %entry ], [ %b.addr.0, %while.body ]
>>   %b.addr.0 = phi i32 [ %b, %entry ], [ %a.addr.0, %while.body ]
>>   %call = call i32 %g(i32 %a.addr.0, i32 %b.addr.0)
>>   %tobool = icmp ne i32 %call, 0
>>   br i1 %tobool, label %while.body, label %while.end
>>
>> while.body:                                       ; preds = %while.cond
>>   br label %while.cond
>>
>> while.end:                                        ; preds = %while.cond
>>   ret void
>> }
>>
>> A "phi" works in the only way which allows this IR to match the semantics
>> of the C code.
>>
>
> I'm not sure i believe this, but i actually don't care enough to argue
> about it further :)
> That said, i'm curious:
>
> What are the expected semantics of the second case i presented?
>
>
>>
>> If you think LangRef isn't clear, suggestions are welcome.
>>
> I would be explicit that phi nodes may depend on each other, and what the
> expected evaluation order actually is (if it's "as-they-appear-in-IR", say
> that.)
> It looks something like "dependencies are allowed, any order is allowed
> despites dependencies".
>
>
>>
>> Popping back up, regardless of resolution, this causes the issue i
>> mentioned above - it may require more iterations to resolve because of the
>> second case passing verification.  If we really want phi nodes to be
>> executable , and want it to take the minimum number of iterations to
>> converge NewGVN, we need to process aa before t1.
>>
>> Otherwise, we will process t1, get some value, *then* process aa, and
>> immediately mark t1 as needing to be reprocessed since it is a use of aa.
>> We effectively waste an iteration because all of t1's uses have are going
>> to have the wrong value.
>>
>>
>> It looks like NewGVN creates one less congruence class if you process
>> them in the "right" order.
>>
>
> Well, no. It's not about creating the congruence class, it's about making
> it a reverse post-order evaluation .  In your example above, there is no
> single RPO order due to the two cycles.
> In my example, any valid  RPO order of the SSA graph must visit aa before
> t1.
> This is *normally* taken care of by evaluating instructions in block
> order, and ordering the CFG in RPO.   But in the case i gave, it is not
> enough.
> You would have to explicitly sort the phi nodes separately, since we allow
> both orderings of the phi nodes in LLVM,despite only one being RPO (and
> only one having defs dominate uses).
>
>
>
>> I'm not sure there's any way to usefully generalize that heuristic,
>> though; you're only saving time based on discovering the cycle one step
>> faster.
>>
>> I'm not sure what you are trying to say here.
> You are saving time by not performing iterations that are unnecessary.
> The generalized heuristic is exactly "perform this problem by evaluating
> the SSA graph/CFG in RPO order".  This is provable, it's the notion of a
> "rapid" problem.
>

s/notion/number of iterations it takes to compute a/

>
> If you placed my aa/t1 example in the first block of a 10000 block
> function, you will process 10000 blocks (1 iteration) uselessly.
>

(assuming all values are dependent on previous values, ...)


> If you sort it into a valid RPO order of the SSA graph, you will not.
>
> This generalizes to *any* "rapid" problem.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161228/19312415/attachment.html>


More information about the llvm-commits mailing list