[PATCH] D34200: [PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.

Fri Jul 28 14:20:21 PDT 2017

sanjoy added inline comments.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:767
+      if (LI.getLoopFor(BB) == &OrigL)
+        LI.changeLoopFor(ClonedBB, &ClonedL);
+    }
----------------
chandlerc wrote:
> sanjoy wrote:
> > Add an assert that `ClonedBB` does not have a containing loop yet.
> > 
> > edit: I think it is better to write this as:
> > 
> > ```
> > auto *ClonedBB = cast<BasicBlock>(VMap.lookup(BB));
> > if (LI.getLoopFor(BB) == &OrigL)
> >   ClonedL.addBasicBlockToLoop(ClonedBB, LI);
> > ```
> > 
> > which has the assert, and avoids the slightly odd looking (since it is unconditional) `ClonedL.addBlockEntry(ClonedBB);`, and to remove the
> > 
> > ```
> >       for (Loop *PL = ClonedL; PL; PL = PL->getParentLoop())
> >         PL->addBlockEntry(ClonedBB);
> > ```
> > 
> > in `buildClonedLoops`.
> I used to use `addBasicBlockToLoop` but it proved very error prone. It added the blocks to too many parent loops and changed the *order* in which the blocks were added to parent loops. I'd much rather keep this unless there is a functional issue here.
SGTM

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:867
+    // is in the possible set.
+    if (Pred == ClonedPH || !PossibleClonedLoopBlockSet.count(Pred))
+      continue;
----------------
chandlerc wrote:
> sanjoy wrote:
> > When will `PossibleClonedLoopBlockSet.count(Pred)` be false (assuming `Pred` is not `ClonedPH`)?  I thought you don't clone the dead blocks (dominated by the edge not cloned into the cloned loop) at all?
> Unreachable code within the loop that gets cloned, but when we do our backwards walk is not found and associated with the loop.
> 
> This is a somewhat hypothetical (I don't have a test case) but it seemed principled.
It looks like LoopInfo filters out unreachable blocks (and does not insert them into `llvm::Loop`s): https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LoopInfoImpl.h#L360 so IIUC that situation exactly won't happen.  Are there any other such cases?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:896
+
+    ClonedL = new Loop();
+    if (ParentL) {
----------------
chandlerc wrote:
> sanjoy wrote:
> > Can you move the declaration of `ClonedL` here?
> Not trivially. We don't always clone a loop and we return the cloned loop if any. Suggestions?
Never mind then. :)

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:898
+    if (ParentL) {
+      ParentL->addBasicBlockToLoop(ClonedPH, LI);
+      ParentL->addChildLoop(ClonedL);
----------------
chandlerc wrote:
> sanjoy wrote:
> > Why not remove the `ParentL->addBasicBlockToLoop(ClonedPH, LI);` here and add the preheader using the generic `UnloopedBlockSet` logic at the end?
> Because this seemed like a more precise and definitive result?
> 
> Essentially, *if* we establish a loop here, we can reason about the header and preheader using that information alone. This in turn should prune the UnloopedBlockSet logic below. I'm not sure why sinking this would be more clear?
I was thinking about this as -- the preheader is just another block that did not get included into `ClonedL` so it should be handled by the generic logic that handles such blocks.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:907
+    // discovered them. The original order of blocks was carefully built in
+    // a way that is stable w.r.t. predecessor ordering. Rather than re-invent
+    // that logic, we just re-walk the original blocks (and those of the child
----------------
chandlerc wrote:
> sanjoy wrote:
> > Not sure what you mean by "stable w.r.t. predecessor ordering".
> I tried to clarify this comment, not sure how successful I was.
I might use "not arbitrary" instead of "stable".  "stable" seems to imply "deterministic" which is slightly confusing since predecesor ordering should be deterministic as well.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:989
+      // We can stop recursing at the cloned preheader (if we get there).
+      if (BB == ClonedPH)
+        continue;
----------------
chandlerc wrote:
> sanjoy wrote:
> > When will we hit this?  Won't all of the paths "die out" once they hit `UnloopedBlockSet.erase(PredBB)`?
> No?
> 
> The UnloopedBlockSet might have the ClonedPH in it, and if it does, we could erase it once, and put it into the worklist. We'd then visit it and not want to walk the predecessors of that cloned PH.
Sorry, I forgot the case where the loop disappears because we delete all of the backedges.  However, I think we should be able to assert `BlocksInClonedLoop.empty()` under `if (BB == ClonedPH)` (since the header of the cloned loop should still dominate the exit, and at least that header won't be in `UnloopedBlockSet`?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1002
+
+        // We just insert into the loop set here. We'll add these blocks to the
+        // exit loop after we build up the set in a deterministic order rather
----------------
chandlerc wrote:
> sanjoy wrote:
> > I thought `predecessors` was deterministic too (since the use-list has deterministic order)?
> So, LoopInfo goes to some trouble to avoid its block order relying on use-list order via the predecessors order. Regardless of whether that's important or not, I think it is reasonable to try and continue here as unswitch isn't the thing that should walk that back.
> 
> I also personally agree with that call. I don't like use-list order dependencies anywhere we can avoid them. I understand that they're nearly impossible to avoid today, but I think we would benefit greatly from moving away from that any and everywhere we can.
In that case, the problem is that not that the use list order is not deterministic but that it is nonsensical, so I'd change the comment to reflect that.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1280
+                   [&](BasicBlock *LHS, BasicBlock *RHS) {
+                     return LI.getLoopDepth(LHS) < LI.getLoopDepth(RHS);
+                   });
----------------
chandlerc wrote:
> sanjoy wrote:
> > Why do we care about the sort being stable?
> Same answer as the rest of these... My default is stable. If there is a reason why the order of this *cannot* impact the order of IR data structures that impact the order of transforms downstream, that's worth optimizing, but until then we should preserve incoming order of things IMO.
That ("prefer stable unless you can prove it does not matter") SGTM.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1287
+
+  auto RemoveUnloopedBlocksFromLoop = [&](Loop &L) {
+    for (auto *BB : UnloopedSet)
----------------
chandlerc wrote:
> sanjoy wrote:
> > There already is a variable named `L` in scope, so please rename the argument to something else.
> > 
> > Perhaps clang should warn about this? :)
> There really is no good name here. But the capture is just confusing. I think this can and should be a function w/o any capture that just operates on its arguments, removing the shadowing. Does taht work for you?
SGTM.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1314
+      if (BB == PH)
+        continue;
+
----------------
chandlerc wrote:
> sanjoy wrote:
> > When would we ever get to the preheader?
> See above about preheader. I think the same logic applies to both routines.
I've replied above on this.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1343
+      if (Loop *BBL = LI.getLoopFor(BB))
+        if (BBL == &L || !L.contains(BBL))
+          LI.changeLoopFor(BB, &ExitL);
----------------
chandlerc wrote:
> sanjoy wrote:
> > I don't see when the `!L.contains(BBL)` bit will kick in -- won't all of the blocks in `ExitLoopSet` either be exit blocks of the original L, or previously interior blocks that are no longer interior?
> I'm not following ... exit blocks of the original L are exactly what would be identified by this check? My brain is fuzzy on this, so maybe I'm just misunderstanding the point you're making.
I meant to say:

 - `ExitLoopSet` contains `ExitBB` and a set of blocks that were internal to `L`.
 - The loop for `ExitBB` is already `ExitL` so we don't need to touch it.
 - For the blocks that were internal to `L`, `L.contains(BBL)` will never never be false except for the case when `BBL == &L`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1357
+    if (Loop *BBL = LI.getLoopFor(BB))
+      if (BBL == &L || !L.contains(BBL))
+        LI.changeLoopFor(BB, nullptr);
----------------
chandlerc wrote:
> sanjoy wrote:
> > Again, I can't see when `!L.contains(BBL)` will kick in.
> Hoping that these two issues are the same.
Replied above.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1589
+    // accurate, we want sub-trees within the original loop to be
+    // conservatively correct and that requires moving the merge block out of
+    // that subtree.
----------------
chandlerc wrote:
> sanjoy wrote:
> > Define the direction of "conservative" here -- does it mean "A conservative-dominates B implies A actually-dominates B" or the converse?
> This whole thing is kinda garbage.
> 
> Anyways, tried to rephrase it. But not sure this really addresses the concern. Maybe I should nuke all usage of domtree blewo this to avoid any concerns.
> Maybe I should nuke all usage of domtree blewo this to avoid any concerns.

If that's possible, I'd certainly prefer that.

I'd phrase the comment as "we know that the split point definitely dominates the merge block" or something like that.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1640
+          cast_or_null<BasicBlock>(VMap.lookup(ContinueSuccBB)))
+    ClonedContinueSuccBB->removePredecessor(ClonedParentBB,
+                                            /*DontDeleteUselessPHIs*/ true);
----------------
chandlerc wrote:
> sanjoy wrote:
> > Why is `DontDeleteUselessPHIs` false?
> Because they could be LCSSA PHIs?
I see -- can `DontDeleteUselessPHIs` be `!L.contains(ContinueSuccBB)` in that case?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1719
+  // and we can do it in any order as they don't nest relative to each other.
+  for (Loop *UpdatedL : llvm::concat<Loop *>(NonChildClonedLoops, HoistedLoops))
+    UpdateLCSSA(*UpdatedL);
----------------
chandlerc wrote:
> sanjoy wrote:
> > As I said above, I'm not convinced that we need this bit.
> Which bit? i'm not sure I'm following you here.
Updating LCSSA for `NonChildClonedLoops` and `HoistedLoops`.  Since we didn't break apart these loops, we should not have new uses of values defined inside these loops outside these loops (as opposed to the `L`, `ClonedL` and their parents, which we did split up and thus uses for interior values that were interior before may be exterior now).

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1726
+    SmallVector<Loop *, 4> OuterLoops;
+    // The top of our stack will be the cloned loop and the current loop if
+    // they are loops. ALso, if either the cloned loop or the current loop have
----------------
chandlerc wrote:
> sanjoy wrote:
> > Instead of commenting on what's at the top of the stack, how about phrasing the comment as on the order in which we will process the loops?
> > 
> > s/ALso/Also/
> I don't think there is an order that I can reasonably document other than the first one?
I meant that you start with `L` and `ClonedL` (if they still exist) and move up the loop tree till `OuterExitL`.

https://reviews.llvm.org/D34200