[PATCH] D34200: [PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.

Wed Jul 19 01:29:15 PDT 2017

chandlerc added a comment.

Most of the comments addressed or responded to here... Still working on the actual refactoring ,sorry for the delay there.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:767
+      if (LI.getLoopFor(BB) == &OrigL)
+        LI.changeLoopFor(ClonedBB, &ClonedL);
+    }
----------------
sanjoy wrote:
> Add an assert that `ClonedBB` does not have a containing loop yet.
> 
> edit: I think it is better to write this as:
> 
> ```
> auto *ClonedBB = cast<BasicBlock>(VMap.lookup(BB));
> if (LI.getLoopFor(BB) == &OrigL)
>   ClonedL.addBasicBlockToLoop(ClonedBB, LI);
> ```
> 
> which has the assert, and avoids the slightly odd looking (since it is unconditional) `ClonedL.addBlockEntry(ClonedBB);`, and to remove the
> 
> ```
>       for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
>         PL->addBlockEntry(ClonedBB);
> ```
> 
> in `buildClonedLoops`.
I used to use `addBasicBlockToLoop` but it proved very error prone. It added the blocks to too many parent loops and changed the *order* in which the blocks were added to parent loops. I'd much rather keep this unless there is a functional issue here.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:867
+    // is in the possible set.
+    if (Pred == ClonedPH || !PossibleClonedLoopBlockSet.count(Pred))
+      continue;
----------------
sanjoy wrote:
> When will `PossibleClonedLoopBlockSet.count(Pred)` be false (assuming `Pred` is not `ClonedPH`)?  I thought you don't clone the dead blocks (dominated by the edge not cloned into the cloned loop) at all?
Unreachable code within the loop that gets cloned, but when we do our backwards walk is not found and associated with the loop.

This is a somewhat hypothetical (I don't have a test case) but it seemed principled.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:891
+      for (auto *Pred : predecessors(BB))
+        if (PossibleClonedLoopBlockSet.count(Pred) &&
+            ClonedLoopBlockSet.insert(Pred).second)
----------------
sanjoy wrote:
> Like above, I don't understand the use of `PossibleClonedLoopBlockSet` here -- since you don't clone blocks dead due to the edge removal in `unswitchInvariantBranch`, you shouldn't really have a predecessor of a cloned block that hasn't been cloned.
> 
> If I'm wrong here, I think the rationale should be documented.
Same answer as above, tried to add rationale. Happy to update this if I'm missing something, I've not thought deeply here.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:896
+
+    ClonedL = new Loop();
+    if (ParentL) {
----------------
sanjoy wrote:
> Can you move the declaration of `ClonedL` here?
Not trivially. We don't always clone a loop and we return the cloned loop if any. Suggestions?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:898
+    if (ParentL) {
+      ParentL->addBasicBlockToLoop(ClonedPH, LI);
+      ParentL->addChildLoop(ClonedL);
----------------
sanjoy wrote:
> Why not remove the `ParentL->addBasicBlockToLoop(ClonedPH, LI);` here and add the preheader using the generic `UnloopedBlockSet` logic at the end?
Because this seemed like a more precise and definitive result?

Essentially, *if* we establish a loop here, we can reason about the header and preheader using that information alone. This in turn should prune the UnloopedBlockSet logic below. I'm not sure why sinking this would be more clear?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:907
+    // discovered them. The original order of blocks was carefully built in
+    // a way that is stable w.r.t. predecessor ordering. Rather than re-invent
+    // that logic, we just re-walk the original blocks (and those of the child
----------------
sanjoy wrote:
> Not sure what you mean by "stable w.r.t. predecessor ordering".
I tried to clarify this comment, not sure how successful I was.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:924
+      // loop for this block.
+      for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
+        PL->addBlockEntry(ClonedBB);
----------------
sanjoy wrote:
> I've added a comment regarding this in `cloneLoopNest`.
Continuing discussion there.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:989
+      // We can stop recursing at the cloned preheader (if we get there).
+      if (BB == ClonedPH)
+        continue;
----------------
sanjoy wrote:
> When will we hit this?  Won't all of the paths "die out" once they hit `UnloopedBlockSet.erase(PredBB)`?
No?

The UnloopedBlockSet might have the ClonedPH in it, and if it does, we could erase it once, and put it into the worklist. We'd then visit it and not want to walk the predecessors of that cloned PH.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1002
+
+        // We just insert into the loop set here. We'll add these blocks to the
+        // exit loop after we build up the set in a deterministic order rather
----------------
sanjoy wrote:
> I thought `predecessors` was deterministic too (since the use-list has deterministic order)?
So, LoopInfo goes to some trouble to avoid its block order relying on use-list order via the predecessors order. Regardless of whether that's important or not, I think it is reasonable to try and continue here as unswitch isn't the thing that should walk that back.

I also personally agree with that call. I don't like use-list order dependencies anywhere we can avoid them. I understand that they're nearly impossible to avoid today, but I think we would benefit greatly from moving away from that any and everywhere we can.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1263
+          : std::stable_partition(
+                Blocks.begin(), Blocks.end(),
+                [&](BasicBlock *BB) { return LoopBlockSet.count(BB); });
----------------
sanjoy wrote:
> Why do you care about a stable partition here?
Because the whole idea is that the loop blocks list should have a stable ordering, and so I don't want to permute it while partitioning? Lots of transforms' order depends on the loop block order.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1280
+                   [&](BasicBlock *LHS, BasicBlock *RHS) {
+                     return LI.getLoopDepth(LHS) < LI.getLoopDepth(RHS);
+                   });
----------------
sanjoy wrote:
> Why do we care about the sort being stable?
Same answer as the rest of these... My default is stable. If there is a reason why the order of this *cannot* impact the order of IR data structures that impact the order of transforms downstream, that's worth optimizing, but until then we should preserve incoming order of things IMO.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1287
+
+  auto RemoveUnloopedBlocksFromLoop = [&](Loop &L) {
+    for (auto *BB : UnloopedSet)
----------------
sanjoy wrote:
> There already is a variable named `L` in scope, so please rename the argument to something else.
> 
> Perhaps clang should warn about this? :)
There really is no good name here. But the capture is just confusing. I think this can and should be a function w/o any capture that just operates on its arguments, removing the shadowing. Does taht work for you?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1314
+      if (BB == PH)
+        continue;
+
----------------
sanjoy wrote:
> When would we ever get to the preheader?
See above about preheader. I think the same logic applies to both routines.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1343
+      if (Loop *BBL = LI.getLoopFor(BB))
+        if (BBL == &L || !L.contains(BBL))
+          LI.changeLoopFor(BB, &ExitL);
----------------
sanjoy wrote:
> I don't see when the `!L.contains(BBL)` bit will kick in -- won't all of the blocks in `ExitLoopSet` either be exit blocks of the original L, or previously interior blocks that are no longer interior?
I'm not following ... exit blocks of the original L are exactly what would be identified by this check? My brain is fuzzy on this, so maybe I'm just misunderstanding the point you're making.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1357
+    if (Loop *BBL = LI.getLoopFor(BB))
+      if (BBL == &L || !L.contains(BBL))
+        LI.changeLoopFor(BB, nullptr);
----------------
sanjoy wrote:
> Again, I can't see when `!L.contains(BBL)` will kick in.
Hoping that these two issues are the same.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1589
+    // accurate, we want sub-trees within the original loop to be
+    // conservatively correct and that requires moving the merge block out of
+    // that subtree.
----------------
sanjoy wrote:
> Define the direction of "conservative" here -- does it mean "A conservative-dominates B implies A actually-dominates B" or the converse?
This whole thing is kinda garbage.

Anyways, tried to rephrase it. But not sure this really addresses the concern. Maybe I should nuke all usage of domtree blewo this to avoid any concerns.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1640
+          cast_or_null<BasicBlock>(VMap.lookup(ContinueSuccBB)))
+    ClonedContinueSuccBB->removePredecessor(ClonedParentBB,
+                                            /*DontDeleteUselessPHIs*/ true);
----------------
sanjoy wrote:
> Why is `DontDeleteUselessPHIs` false?
Because they could be LCSSA PHIs?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1696
+
+  // We can change which blocks are exit blocks of all the cloned sibling
+  // loops, the current loop, and any parent loops which shared exit blocks
----------------
sanjoy wrote:
> When you say "cloned sibling loop", do you include all of the loops that used to be subloops of L and have now been reparented?  I can't think of a case in which the exit block set for those will change.
Yes, I do include those.

We can change which loop an exit block fo a cloned sibling belongs to. Not sure if we can actually change the exit set. But is there a meaningful distinction here?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1719
+  // and we can do it in any order as they don't nest relative to each other.
+  for (Loop *UpdatedL : llvm::concat<Loop *>(NonChildClonedLoops, HoistedLoops))
+    UpdateLCSSA(*UpdatedL);
----------------
sanjoy wrote:
> As I said above, I'm not convinced that we need this bit.
Which bit? i'm not sure I'm following you here.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1726
+    SmallVector<Loop *, 4> OuterLoops;
+    // The top of our stack will be the cloned loop and the current loop if
+    // they are loops. ALso, if either the cloned loop or the current loop have
----------------
sanjoy wrote:
> Instead of commenting on what's at the top of the stack, how about phrasing the comment as on the order in which we will process the loops?
> 
> s/ALso/Also/
I don't think there is an order that I can reasonably document other than the first one?

https://reviews.llvm.org/D34200