[PATCH] D28129: NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.

Mon Jan 2 10:22:33 PST 2017

It's also interesting to note that this definition of "phi nodes"  is the
cause of PR 31501.
After careful analysis, i've come to the conclusion it also essentially
prevents any real fixpoint analysis of phi nodes, so i've disabled portions
of it in NewGVN (we now disable analysis whenever a phi node evaluates to
another phi node)
Due to allowing mutually dependent cycles, they will ping pong back and
forth, and never converge.
(SCCP only looks at constants. If it tried to evaluate even basic copy
equivalence, it would have already hit this).

This blocks some optimization cases.

This problem does not occur with any of the multiple block cases because
phi nodes can't be congruent to phi nodes in different blocks[1]

Even with the hacks above, we can't do proper elimination with phi nodes as
leader  without even more hacks due to what LLVM's definition allows.

IMHO, this representation offers no real advantage over using temporaries
in the mutually dependent and "operand defined after use" cases  and
several real disadvantages.

I'm sure more real issues will pop up as we attempt to actually optimize
better.

:(

[1] Without proving their predicate conditions, etc are the same, which
nobody does.

On Thu, Dec 29, 2016 at 12:22 AM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>>>
>> Both your functions are equivalent to something like this:
>>
>> void widget() {
>>   int aa = 5;
>>   int t1 = 0;
>>   while (true) {
>>     // dosomething(aa, t1);
>>     int x = aa; aa = t1 + 1; t1 = x;
>>   }
>> }
>>
>> They're also equivalent to this:
>>
>> define void @widget() {
>> entry:
>>   br label %bb2
>>
>> bb2:                                              ; preds = %bb4, %entry
>>   %aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
>>   %t1 = phi i64 [ 0, %entry ], [ %aacopy, %bb4 ]
>>   br label %bb4
>>
>>
>> bb4:                                              ; preds = %bb3, %bb2
>>   %t5 = add i64 %t1, 1
>>   %aacopy = add i64 %aa, 0
>>   br label %bb2
>> }
>>
>> If you think LangRef isn't clear, suggestions are welcome.
>>>>
>>>
>>> I would be explicit that phi nodes may depend on each other, and what
>>> the expected evaluation order actually is (if it's
>>> "as-they-appear-in-IR", say that.)
>>> It looks something like "dependencies are allowed, any order is
>>> allowed despites dependencies".
>>>
>>
>> There is no evaluation order; alternatively, every possible evaluation
>> order is equivalent.  If a PHI node refers to another PHI node in the same
>> basic block, it's actually referring to the value that PHI node had in the
>> predecessor.
>
>
> Okay, if that's actually correct, and no matter what order they appear, no
> "updates" occur until after all the node are processed.
> (IE it literally *always* refers to the previous value) then we should
> write that.
>
>
> Note that gcc takes, the IMHO, better path, of just using explicit
> temporaries where necessary to avoid these kinds of "phi nodes":
>  f (int a, int b, int (*<T3ee>) (int, int) g)
>  {
>    int x;
>    int _9;
>
>    <bb 2>:
>    goto <bb 4>;
>
>    <bb 3>:
>    x_10 = a_1;
>    a_11 = b_2;
>    b_12 = x_10;
>
>    <bb 4>:
>    # a_1 = PHI <a_4(D)(2), a_11(3)>
>    # b_2 = PHI <b_5(D)(2), b_12(3)>
>    _9 = g_7(D) (a_1, b_2);
>    if (_9 != 0)
>      goto <bb 3>;
>    else
>      goto <bb 5>;
>
>    <bb 5>:
>    return;
>
>  }
> It moves the necessary evaluation ordering/cycles out of the phi nodes and
> into the explicit parts of the IR.
>
>
>>
>> -----
>>
>> Another slightly more complicated case:
>>
>> define void @cyclical_adds() {
>> entry:
>>   br label %bb2
>>
>> bb2:                                              ; preds = %bb4, %entry
>>   %aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
>>   %t1 = phi i64 [ 0, %entry ], [ %aa1, %bb4 ]
>>   br label %bb4
>>
>>
>> bb4:                                              ; preds = %bb3, %bb2
>>   ; a bunch of code using aa and t1
>>   %t5 = add i64 %t1, 1
>>   %aa1 = add i64 %aa, 1
>>   br label %bb2
>> }
>>
>> If you can optimize the evaluation order here, I think the solution would
>> also cover your original example.
>>
>>
> The best order in *all* of these cases is actually pretty easy, AFAIK:
>
> Generate RPO for SSA graph from def-use chains
> Generate RPO for CFG
>
> Iterate in CFG order for blocks, and inside each block, RPO order of SSA
> graph for instructions.
>
> You can't do better than this in general.
>
> Because of the defs dominate uses property, for llvm ir, the second part
> reduces to "evaluate phi nodes in whatever RPO of the SSA graph ended up,
> evaluate instructions in block order ".
>
> In gcc, it suffices simply to use the RPO for CFG + walk instructions in a
> block.
>
> Note that i contrived my example to make both incoming edges reachable at
> the same time.
> Otherwise you are guaranteed another iteration anyway until we discover
> the edge is reachable.
>
> Anyhoo, i'll generate the above ordering and run it on all my testcases
> and quantify how much better or worse it is.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170102/e1f347c5/attachment.html>