[PATCH] D28129: NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.

Thu Dec 29 00:22:54 PST 2016

>
>
>>
> Both your functions are equivalent to something like this:
>
> void widget() {
>   int aa = 5;
>   int t1 = 0;
>   while (true) {
>     // dosomething(aa, t1);
>     int x = aa; aa = t1 + 1; t1 = x;
>   }
> }
>
> They're also equivalent to this:
>
> define void @widget() {
> entry:
>   br label %bb2
>
> bb2:                                              ; preds = %bb4, %entry
>   %aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
>   %t1 = phi i64 [ 0, %entry ], [ %aacopy, %bb4 ]
>   br label %bb4
>
>
> bb4:                                              ; preds = %bb3, %bb2
>   %t5 = add i64 %t1, 1
>   %aacopy = add i64 %aa, 0
>   br label %bb2
> }
>
> If you think LangRef isn't clear, suggestions are welcome.
>>>
>>
>> I would be explicit that phi nodes may depend on each other, and what
>> the expected evaluation order actually is (if it's
>> "as-they-appear-in-IR", say that.)
>> It looks something like "dependencies are allowed, any order is
>> allowed despites dependencies".
>>
>
> There is no evaluation order; alternatively, every possible evaluation
> order is equivalent.  If a PHI node refers to another PHI node in the same
> basic block, it's actually referring to the value that PHI node had in the
> predecessor.

Okay, if that's actually correct, and no matter what order they appear, no
"updates" occur until after all the node are processed.
(IE it literally *always* refers to the previous value) then we should
write that.

Note that gcc takes, the IMHO, better path, of just using explicit
temporaries where necessary to avoid these kinds of "phi nodes":
 f (int a, int b, int (*<T3ee>) (int, int) g)
 {
   int x;
   int _9;

   <bb 2>:
   goto <bb 4>;

   <bb 3>:
   x_10 = a_1;
   a_11 = b_2;
   b_12 = x_10;

   <bb 4>:
   # a_1 = PHI <a_4(D)(2), a_11(3)>
   # b_2 = PHI <b_5(D)(2), b_12(3)>
   _9 = g_7(D) (a_1, b_2);
   if (_9 != 0)
     goto <bb 3>;
   else
     goto <bb 5>;

   <bb 5>:
   return;

 }
It moves the necessary evaluation ordering/cycles out of the phi nodes and
into the explicit parts of the IR.

>
> -----
>
> Another slightly more complicated case:
>
> define void @cyclical_adds() {
> entry:
>   br label %bb2
>
> bb2:                                              ; preds = %bb4, %entry
>   %aa = phi i64 [ 5, %entry ], [ %t5, %bb4]
>   %t1 = phi i64 [ 0, %entry ], [ %aa1, %bb4 ]
>   br label %bb4
>
>
> bb4:                                              ; preds = %bb3, %bb2
>   ; a bunch of code using aa and t1
>   %t5 = add i64 %t1, 1
>   %aa1 = add i64 %aa, 1
>   br label %bb2
> }
>
> If you can optimize the evaluation order here, I think the solution would
> also cover your original example.
>
>
The best order in *all* of these cases is actually pretty easy, AFAIK:

Generate RPO for SSA graph from def-use chains
Generate RPO for CFG

Iterate in CFG order for blocks, and inside each block, RPO order of SSA
graph for instructions.

You can't do better than this in general.

Because of the defs dominate uses property, for llvm ir, the second part
reduces to "evaluate phi nodes in whatever RPO of the SSA graph ended up,
evaluate instructions in block order ".

In gcc, it suffices simply to use the RPO for CFG + walk instructions in a
block.

Note that i contrived my example to make both incoming edges reachable at
the same time.
Otherwise you are guaranteed another iteration anyway until we discover the
edge is reachable.

Anyhoo, i'll generate the above ordering and run it on all my testcases and
quantify how much better or worse it is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161229/f5455f56/attachment.html>