[PATCH] D52827: [LICM] Make LICM able to hoist phis

Wed Oct 31 23:01:47 PDT 2018

mkazantsev requested changes to this revision.
mkazantsev added inline comments.
This revision now requires changes to proceed.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:681
+    }
+    BasicBlock *BB = Worklist[idx]->getBlock();
+    Worklist.erase(Worklist.begin() + idx);
----------------
john.brawn wrote:
> mkazantsev wrote:
> > How do you ensure that you have found something? Is it correct to do it if nothing was found and `idx` has a default value?
> It's actually possible to not find anything, in which case we fall back to picking the first element of the worklist. I've updated the comment to make this clear.
I think I now understand the algorithm you're trying to apply.

Imagine the graph `A -> B -> C -> D -> E`, and the blocks happened to be arranged as `E, D, C, B, A` in the Worklist. Maybe I am missing something, but it seems that the algorithm inside `collectChildrenInLoop` is BFS-based and cannot protect us from this situation. It could, for example, happen if all these blocks have a common parent and arranged in the order `E, D, C, B, A` in this parent.

The downside if your algorithm is that it will traverse over all worklist to find `A`, then over all remaining worklist to find `B` and so on. Each `any_of` query will work `O(N)`, and you do N such queries, so overall algorithmic complexity is `O(N^2)`.

The correct way to efficiently process all blocks before their successors is processing them in Reverse Post-Order. This arrangement by definition gives you what you want: it sorts the blocks in order that all successors go after their predecessors. If the elements of Worklist are arranged in RPO, you can just process them in order without extra checks, and it will be `O(N)`.

You can see example how to build RPO in class `LoopBlocksDFS`, the methods you need are `beginRPO(), endRPO()`.

And aside from all that above, erasing the last element from the Worklist is `O(1)` and of the first one is `O(N)`. When you don't care what element to erase, you should do so to the last one. Erasing the first element is expensive. However once you process them in RPO order, you don't need to erase anything from the Worklist at all because you know that you've processed all predecessors before all successors.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:501
+      // If there's one common successor use that.
+      if (TrueDestSucc.size() == 1)
+        CommonSucc = *TrueDestSucc.begin();
----------------
The general algorithm for non-empty `TrueDestSucc` deals well with case of `size == 1`; do we really need separate processing for this case?

================
Comment at: lib/Transforms/Scalar/LICM.cpp:622
+      HoistFalseDest->moveBefore(HoistCommonSucc);
+      BranchInst::Create(HoistCommonSucc, HoistFalseDest);
+    }
----------------
Maybe I am missing something, but where do we notify DomTree of existence of these new edges?

================
Comment at: lib/Transforms/Scalar/LICM.cpp:683
+  SmallPtrSet<BasicBlock *, 16> WorklistBlocks;
+  for (DomTreeNode *Node : Worklist)
+    WorklistBlocks.insert(Node->getBlock());
----------------
I guess you could use `insert(Worklist.begin(), Worklist.end())`, it's tad faster.

Repository:
  rL LLVM

https://reviews.llvm.org/D52827