[PATCH] D34200: [PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.

Tue Jul 4 22:18:24 PDT 2017

sanjoy added a comment.

In this episode I've reviewed `buildClonedLoops` and `cloneLoopNest`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1461
+  SmallPtrSet<BasicBlock *, 4> SkippedLoopAndExitBlocks;
+  if (ContinueSuccBB->getUniquePredecessor() ||
+      llvm::all_of(predecessors(ContinueSuccBB), [&](BasicBlock *PredBB) {
----------------
chandlerc wrote:
> sanjoy wrote:
> > Again, I think you can just directly ask the domtree for this.
> There is a BasicBlockEdge query to the domtree, but I'm not sure its a good idea...
> 
> For a branch instruction, it would be fine. But when this becomes a switch, I think it'll do the wrong thing. There, we'll be fine if there are multiple "continue" edges from the switch's parent BB to the continue successor BB, but because there are multiple edges, the edge-based dominates query will return false.
> 
> A different way of looking at this is that this is actually materially different from the dominates query -- this doesn't check if there is *a* dominating edge, it checks if there are any edges entering this successor from a block other than the parent of the unswitched terminator. Which is also what the code is somewhat obviously doing.
> 
> Anyways, happy to rewrite this if you still prefer a method query, but curious whaht exact query you would prefer.
I see -- this sounds fine assuming you're going to generalize this code to switches (as opposed to having a different code path for them altogether).  However, I'd suggest pulling out a helper function.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:759
+static Loop *cloneLoopNest(Loop &OrigRootL, Loop *RootParentL,
+                           ValueToValueMapTy &VMap, LoopInfo &LI) {
+  auto CloneLoopBlocks = [&](Loop &OrigL, Loop &ClonedL) {
----------------
Can `VMap` be a const reference?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:760
+                           ValueToValueMapTy &VMap, LoopInfo &LI) {
+  auto CloneLoopBlocks = [&](Loop &OrigL, Loop &ClonedL) {
+    assert(ClonedL.getBlocks().empty() && "Must start with an empty loop!");
----------------
I'd call this `AddClonedBlocksToLoop`, `CloneLoopBlocks` sounds like you're cloning the blocks themselves.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:767
+      if (LI.getLoopFor(BB) == &OrigL)
+        LI.changeLoopFor(ClonedBB, &ClonedL);
+    }
----------------
Add an assert that `ClonedBB` does not have a containing loop yet.

edit: I think it is better to write this as:

```
auto *ClonedBB = cast<BasicBlock>(VMap.lookup(BB));
if (LI.getLoopFor(BB) == &OrigL)
  ClonedL.addBasicBlockToLoop(ClonedBB, LI);
```

which has the assert, and avoids the slightly odd looking (since it is unconditional) `ClonedL.addBlockEntry(ClonedBB);`, and to remove the

```
      for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
        PL->addBlockEntry(ClonedBB);
```

in `buildClonedLoops`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:806
+///
+/// Due to unswitching simplifying the CFG of the loop, this isn't a trivial
+/// operation. We need to re-verify that there even is a loop (as the backedge
----------------
(Minor nit) How about s/Due to unswitching simplifying the CFG of the loop,/Because unswitching simplifies the CFG of the loop,/

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:818
+static Loop *buildClonedLoops(Loop &OrigL, ArrayRef<BasicBlock *> ExitBlocks,
+                              ValueToValueMapTy &VMap, LoopInfo &LI,
+                              SmallVectorImpl<Loop *> &NonChildClonedLoops) {
----------------
Can `VMap` be a const reference?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:844
+      }
+
+  // We build the set of blocks dominated by the cloned header from the set of
----------------
I'd add an assert here that `ParentL` contains or is equal to the previous parent of the original loop.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:849
+  // aren't in unreachable cycles, etc.
+  SmallPtrSet<BasicBlock *, 16> PossibleClonedLoopBlockSet;
+  SmallVector<BasicBlock *, 16> ClonedLoopBlocks;
----------------
How about using a `SmallSetVector` here?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:862
+  SmallVector<BasicBlock *, 16> Worklist;
+  SmallPtrSet<BasicBlock *, 16> ClonedLoopBlockSet;
+  for (auto *Pred : predecessors(ClonedHeader)) {
----------------
(Minor) I'd call this `BlocksInClonedLoop`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:867
+    // is in the possible set.
+    if (Pred == ClonedPH || !PossibleClonedLoopBlockSet.count(Pred))
+      continue;
----------------
When will `PossibleClonedLoopBlockSet.count(Pred)` be false (assuming `Pred` is not `ClonedPH`)?  I thought you don't clone the dead blocks (dominated by the edge not cloned into the cloned loop) at all?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:888
+
+      // Insert any predecessors that are in the posible set into the cloned
+      // set, and if the insert is successful, add them to the worklist.
----------------
s/posible/possible/

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:891
+      for (auto *Pred : predecessors(BB))
+        if (PossibleClonedLoopBlockSet.count(Pred) &&
+            ClonedLoopBlockSet.insert(Pred).second)
----------------
Like above, I don't understand the use of `PossibleClonedLoopBlockSet` here -- since you don't clone blocks dead due to the edge removal in `unswitchInvariantBranch`, you shouldn't really have a predecessor of a cloned block that hasn't been cloned.

If I'm wrong here, I think the rationale should be documented.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:896
+
+    ClonedL = new Loop();
+    if (ParentL) {
----------------
Can you move the declaration of `ClonedL` here?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:898
+    if (ParentL) {
+      ParentL->addBasicBlockToLoop(ClonedPH, LI);
+      ParentL->addChildLoop(ClonedL);
----------------
Why not remove the `ParentL->addBasicBlockToLoop(ClonedPH, LI);` here and add the preheader using the generic `UnloopedBlockSet` logic at the end?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:907
+    // discovered them. The original order of blocks was carefully built in
+    // a way that is stable w.r.t. predecessor ordering. Rather than re-invent
+    // that logic, we just re-walk the original blocks (and those of the child
----------------
Not sure what you mean by "stable w.r.t. predecessor ordering".

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:924
+      // loop for this block.
+      for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
+        PL->addBlockEntry(ClonedBB);
----------------
I've added a comment regarding this in `cloneLoopNest`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:956
+
+  // Figure out what blocks are left to place within the loop nest. If we never
+  // formed a loop, the cloned PH is one of them.
----------------
By "the loop nest" do you mean "the loop nest containing the loop being unswitched"?  If so, changing the comment to say that may make things clearer.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:989
+      // We can stop recursing at the cloned preheader (if we get there).
+      if (BB == ClonedPH)
+        continue;
----------------
When will we hit this?  Won't all of the paths "die out" once they hit `UnloopedBlockSet.erase(PredBB)`?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1002
+
+        // We just insert into the loop set here. We'll add these blocks to the
+        // exit loop after we build up the set in a deterministic order rather
----------------
I thought `predecessors` was deterministic too (since the use-list has deterministic order)?

https://reviews.llvm.org/D34200