[PATCH] D28129: NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.
Eli Friedman via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 28 22:08:41 PST 2016
On 2016-12-28 20:09, Daniel Berlin wrote:
> On Wed, Dec 28, 2016 at 7:22 PM, Friedman, Eli
> <efriedma at codeaurora.org> wrote:
>
>> On 12/28/2016 2:33 PM, Daniel Berlin wrote:
>>
>> On Wed, Dec 28, 2016 at 1:18 PM, Friedman, Eli
>> <efriedma at codeaurora.org> wrote:
>>
>> On 12/28/2016 1:03 PM, Daniel Berlin via llvm-commits wrote:
>>
>> On Wed, Dec 28, 2016 at 7:04 AM, Davide Italiano via Phabricator
>> <reviews at reviews.llvm.org> wrote:
>> davide accepted this revision.
>> davide added a comment.
>> This revision is now accepted and ready to land.
>>
>> Sorry for the slow response, I'm out('ish) of the office these days.
>> I took a close look at your patch.
>>
>> No worries.
>> I happen to be lucky enough to hit a case in the wild where this
>> already matters. The number of iteration goes down from hundreds to
>> ~10, which makes compile time/me happier.
>>
>> yay.
>>
>> The current code, excepting super-weird cases, should operate in
>> O(d+3) iterations, where d is the loop connectedness of the SSA
>> graph (not the CFG), which is the number of backedges in any path.
>> This will change when we move to equality propagation, but for now,
>> ...
>> We could calculate this number and see if we are screwing up :)
>>
>> For most programs, the loop connectedness of the SSA graph is the
>> same or less than the CFG.
>>
>> However, IIRC, we allow dependent phis in the same block (this is
>> not strictly SSA, since all phi nodes are supposed to be evaluated
>> simultaneously).
>
> I'm not sure what you're trying to say here? PHI nodes for a given
> basic block are evaluated simultaneously. From LangRef: "For the
> purposes of the SSA form, the use of each incoming value is deemed to
> occur on the edge from the corresponding predecessor block to the
> current block (but after any definition of an ‘invoke‘
> instruction’s return value on the same edge)."
>
> I'm saying we've had mailing list arguments about this, about whether
> there is any ordering among phi nodes in a given block. The part you
> quote from the langref does not actually definitively answer that
> (again, there is no argument in theory. In the abstract, the answer
> is "there is no ordering, it's undefined to have phis depend in the
> same block depend on each other")
>
> Given
>
> b = phi(d, e)
> a = phi(b, c)
>
> Saying "is deemed to occur on the edge of the corresponding
> predecessor block" does not help.
> Consider the following function:
>
> void f(int a, int b, int g(int, int)) {
> while (g(a, b)) { int x = a; a = b; b = x; }
> }
>
> mem2reg produces this:
>
> define void @f(i32 %a, i32 %b, i32 (i32, i32)* %g) #0 {
> entry:
> br label %while.cond
>
> while.cond: ; preds =
> %while.body, %entry
> %a.addr.0 = phi i32 [ %a, %entry ], [ %b.addr.0, %while.body ]
> %b.addr.0 = phi i32 [ %b, %entry ], [ %a.addr.0, %while.body ]
> %call = call i32 %g(i32 %a.addr.0, i32 %b.addr.0)
> %tobool = icmp ne i32 %call, 0
> br i1 %tobool, label %while.body, label %while.end
>
> while.body: ; preds =
> %while.cond
> br label %while.cond
>
> while.end: ; preds =
> %while.cond
> ret void
> }
>
> A "phi" works in the only way which allows this IR to match the
> semantics of the C code.
>
> I'm not sure i believe this, but i actually don't care enough to argue
> about it further :)
>
> That said, i'm curious:
>
> What are the expected semantics of the second case i presented?
Both your functions are equivalent to something like this:
void widget() {
int aa = 5;
int t1 = 0;
while (true) {
// dosomething(aa, t1);
int x = aa; aa = t1 + 1; t1 = x;
}
}
They're also equivalent to this:
define void @widget() {
entry:
br label %bb2
bb2: ; preds = %bb4, %entry
%aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
%t1 = phi i64 [ 0, %entry ], [ %aacopy, %bb4 ]
br label %bb4
bb4: ; preds = %bb3, %bb2
%t5 = add i64 %t1, 1
%aacopy = add i64 %aa, 0
br label %bb2
}
>> If you think LangRef isn't clear, suggestions are welcome.
>
> I would be explicit that phi nodes may depend on each other, and what
> the expected evaluation order actually is (if it's
> "as-they-appear-in-IR", say that.)
> It looks something like "dependencies are allowed, any order is
> allowed despites dependencies".
There is no evaluation order; alternatively, every possible evaluation
order is equivalent. If a PHI node refers to another PHI node in the
same basic block, it's actually referring to the value that PHI node had
in the predecessor.
>>> Popping back up, regardless of resolution, this causes the issue i
>>> mentioned above - it may require more iterations to resolve
>>> because of the second case passing verification. If we really
>>> want phi nodes to be executable , and want it to take the minimum
>>> number of iterations to converge NewGVN, we need to process aa
>>> before t1.
>>>
>>> Otherwise, we will process t1, get some value, *then* process aa,
>>> and immediately mark t1 as needing to be reprocessed since it is a
>>> use of aa. We effectively waste an iteration because all of t1's
>>> uses have are going to have the wrong value.
>>
>> It looks like NewGVN creates one less congruence class if you
>> process them in the "right" order.
>
> Well, no. It's not about creating the congruence class, it's about
> making it a reverse post-order evaluation . In your example above,
> there is no single RPO order due to the two cycles.
> In my example, any valid RPO order of the SSA graph must visit aa
> before t1.
> This is *normally* taken care of by evaluating instructions in block
> order, and ordering the CFG in RPO. But in the case i gave, it is
> not enough.
> You would have to explicitly sort the phi nodes separately, since we
> allow both orderings of the phi nodes in LLVM,despite only one being
> RPO (and only one having defs dominate uses).
This sort of goes back to the discussion above... fundamentally both
your example and mine are cyclical. You've just found a rule for
quickly processing a certain narrow class of cycles.
-----
Another slightly more complicated case:
define void @cyclical_adds() {
entry:
br label %bb2
bb2: ; preds = %bb4, %entry
%aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
%t1 = phi i64 [ 0, %entry ], [ %aa1, %bb4 ]
br label %bb4
bb4: ; preds = %bb3, %bb2
; a bunch of code using aa and t1
%t5 = add i64 %t1, 1
%aa1 = add i64 %aa, 1
br label %bb2
}
If you can optimize the evaluation order here, I think the solution
would also cover your original example.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project
More information about the llvm-commits
mailing list