[PATCH] D34200: [PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.

Mon Oct 16 00:52:38 PDT 2017

chandlerc marked 38 inline comments as done.
chandlerc added a comment.

Updates and/or responses. Largely addressed all the comments with the next patch upload.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:867
+    // is in the possible set.
+    if (Pred == ClonedPH || !PossibleClonedLoopBlockSet.count(Pred))
+      continue;
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > When will `PossibleClonedLoopBlockSet.count(Pred)` be false (assuming `Pred` is not `ClonedPH`)?  I thought you don't clone the dead blocks (dominated by the edge not cloned into the cloned loop) at all?
> > Unreachable code within the loop that gets cloned, but when we do our backwards walk is not found and associated with the loop.
> > 
> > This is a somewhat hypothetical (I don't have a test case) but it seemed principled.
> It looks like LoopInfo filters out unreachable blocks (and does not insert them into `llvm::Loop`s): https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Analysis/LoopInfoImpl.h#L360 so IIUC that situation exactly won't happen.  Are there any other such cases?
I've turned this into an assert.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:907
+    // discovered them. The original order of blocks was carefully built in
+    // a way that is stable w.r.t. predecessor ordering. Rather than re-invent
+    // that logic, we just re-walk the original blocks (and those of the child
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > Not sure what you mean by "stable w.r.t. predecessor ordering".
> > I tried to clarify this comment, not sure how successful I was.
> I might use "not arbitrary" instead of "stable".  "stable" seems to imply "deterministic" which is slightly confusing since predecesor ordering should be deterministic as well.
Reworded to be even less confusing I think by just directly stating the constraint: don't rely on predecessor order.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1343
+      if (Loop *BBL = LI.getLoopFor(BB))
+        if (BBL == &L || !L.contains(BBL))
+          LI.changeLoopFor(BB, &ExitL);
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > I don't see when the `!L.contains(BBL)` bit will kick in -- won't all of the blocks in `ExitLoopSet` either be exit blocks of the original L, or previously interior blocks that are no longer interior?
> > I'm not following ... exit blocks of the original L are exactly what would be identified by this check? My brain is fuzzy on this, so maybe I'm just misunderstanding the point you're making.
> I meant to say:
> 
>  - `ExitLoopSet` contains `ExitBB` and a set of blocks that were internal to `L`.
>  - The loop for `ExitBB` is already `ExitL` so we don't need to touch it.
>  - For the blocks that were internal to `L`, `L.contains(BBL)` will never never be false except for the case when `BBL == &L`.
Sorry, yes, this is trivially true. Sorry for not seeing this the first time around.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1589
+    // accurate, we want sub-trees within the original loop to be
+    // conservatively correct and that requires moving the merge block out of
+    // that subtree.
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > Define the direction of "conservative" here -- does it mean "A conservative-dominates B implies A actually-dominates B" or the converse?
> > This whole thing is kinda garbage.
> > 
> > Anyways, tried to rephrase it. But not sure this really addresses the concern. Maybe I should nuke all usage of domtree blewo this to avoid any concerns.
> > Maybe I should nuke all usage of domtree blewo this to avoid any concerns.
> 
> If that's possible, I'd certainly prefer that.
> 
> I'd phrase the comment as "we know that the split point definitely dominates the merge block" or something like that.
I think the comment is better now, but let me know if it still doesn't make you happy.

I've checked, and we *only* use the dominator tree on blocks in the original loop, so these nodes aren't important. I've also documented what is going on there.

I think the only realistic solution in the end will be to teach this to accurately update the dominator tree, but that should be a separate patch.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1640
+          cast_or_null<BasicBlock>(VMap.lookup(ContinueSuccBB)))
+    ClonedContinueSuccBB->removePredecessor(ClonedParentBB,
+                                            /*DontDeleteUselessPHIs*/ true);
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > Why is `DontDeleteUselessPHIs` false?
> > Because they could be LCSSA PHIs?
> I see -- can `DontDeleteUselessPHIs` be `!L.contains(ContinueSuccBB)` in that case?
Couldn't there still be PHIs that are necessary for LCSSA form of some nested loop?

Generally, this seems risky and low value. Why waste time proving that it could be false?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1719
+  // and we can do it in any order as they don't nest relative to each other.
+  for (Loop *UpdatedL : llvm::concat<Loop *>(NonChildClonedLoops, HoistedLoops))
+    UpdateLCSSA(*UpdatedL);
----------------
sanjoy wrote:
> chandlerc wrote:
> > sanjoy wrote:
> > > As I said above, I'm not convinced that we need this bit.
> > Which bit? i'm not sure I'm following you here.
> Updating LCSSA for `NonChildClonedLoops` and `HoistedLoops`.  Since we didn't break apart these loops, we should not have new uses of values defined inside these loops outside these loops (as opposed to the `L`, `ClonedL` and their parents, which we did split up and thus uses for interior values that were interior before may be exterior now).
Maybe I'm being overly conservative here, but I'm having a very hard time convincing myself that we have a constructive reason why LCSSA will be correct here.

We are fundamentally changing the CFG when cloning. I feel like we will keep finding devilish cases where that CFG change subtly changes the LCSSA solution set. Maybe not, but I'm quite paranoid here and not able to really convince myself through either a proof or testing that this is correct without this. Maybe you can? Maybe this should be a follow-up patch?

https://reviews.llvm.org/D34200