[PATCH] D52827: [LICM] Make LICM able to hoist phis

Tue Nov 6 08:39:30 PST 2018

john.brawn added inline comments.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:681
+    }
+    BasicBlock *BB = Worklist[idx]->getBlock();
+    Worklist.erase(Worklist.begin() + idx);
----------------
mkazantsev wrote:
> john.brawn wrote:
> > mkazantsev wrote:
> > > How do you ensure that you have found something? Is it correct to do it if nothing was found and `idx` has a default value?
> > It's actually possible to not find anything, in which case we fall back to picking the first element of the worklist. I've updated the comment to make this clear.
> I think I now understand the algorithm you're trying to apply.
> 
> Imagine the graph `A -> B -> C -> D -> E`, and the blocks happened to be arranged as `E, D, C, B, A` in the Worklist. Maybe I am missing something, but it seems that the algorithm inside `collectChildrenInLoop` is BFS-based and cannot protect us from this situation. It could, for example, happen if all these blocks have a common parent and arranged in the order `E, D, C, B, A` in this parent.
> 
> The downside if your algorithm is that it will traverse over all worklist to find `A`, then over all remaining worklist to find `B` and so on. Each `any_of` query will work `O(N)`, and you do N such queries, so overall algorithmic complexity is `O(N^2)`.
> 
> The correct way to efficiently process all blocks before their successors is processing them in Reverse Post-Order. This arrangement by definition gives you what you want: it sorts the blocks in order that all successors go after their predecessors. If the elements of Worklist are arranged in RPO, you can just process them in order without extra checks, and it will be `O(N)`.
> 
> You can see example how to build RPO in class `LoopBlocksDFS`, the methods you need are `beginRPO(), endRPO()`.
> 
> And aside from all that above, erasing the last element from the Worklist is `O(1)` and of the first one is `O(N)`. When you don't care what element to erase, you should do so to the last one. Erasing the first element is expensive. However once you process them in RPO order, you don't need to erase anything from the Worklist at all because you know that you've processed all predecessors before all successors.
> 
Yes it does look like reverse post-order gives me what I want (though I have to adjust some of the tests as it picks a different but equally good order). I'll modify this to do that.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:501
+      // If there's one common successor use that.
+      if (TrueDestSucc.size() == 1)
+        CommonSucc = *TrueDestSucc.begin();
----------------
mkazantsev wrote:
> The general algorithm for non-empty `TrueDestSucc` deals well with case of `size == 1`; do we really need separate processing for this case?
We could have one case for size >= 1 but that would mean iterating through potentially all of the blocks in the function to find the one element of the set, which seems rather wasteful when we can just take that one element directly.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:622
+      HoistFalseDest->moveBefore(HoistCommonSucc);
+      BranchInst::Create(HoistCommonSucc, HoistFalseDest);
+    }
----------------
mkazantsev wrote:
> Maybe I am missing something, but where do we notify DomTree of existence of these new edges?
Inserting the edge is never needed. insertEdge only has an effect when the source and destination blocks have different immediate dominators (see InsertReachable in GenericDomTreeConstruction.h) and that never happens - HoistTrueDest and HoistFalseDest always have the same immediate dominator as HoistCommonSucc, and for the edge from HoistCommonSucc to TargetSucc it may be that HoistCommonSucc is now the immediate dominator of TargetSucc but that's handled in the loop at line 632.

Repository:
  rL LLVM

https://reviews.llvm.org/D52827