[llvm] r335553 - [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial

Mon Jun 25 16:32:55 PDT 2018

Author: chandlerc
Date: Mon Jun 25 16:32:54 2018
New Revision: 335553

URL: http://llvm.org/viewvc/llvm-project?rev=335553&view=rev
Log:
[PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

Modified:
    llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
    llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
    llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll

Modified: llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h?rev=335553&r1=335552&r2=335553&view=diff
==============================================================================

--- llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h (original)
+++ llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h Mon Jun 25 16:32:54 2018
@@ -17,9 +17,9 @@
 
 namespace llvm {
 
-/// This pass transforms loops that contain branches on loop-invariant
-/// conditions to have multiple loops. For example, it turns the left into the
-/// right code:
+/// This pass transforms loops that contain branches or switches on loop-
+/// invariant conditions to have multiple loops. For example, it turns the left
+/// into the right code:
 ///
 ///  for (...)                  if (lic)
 ///    A                          for (...)
@@ -35,6 +35,31 @@ namespace llvm {
 /// This pass expects LICM to be run before it to hoist invariant conditions out
 /// of the loop, to make the unswitching opportunity obvious.
 ///
+/// There is a taxonomy of unswitching that we use to classify different forms
+/// of this transformaiton:
+///
+/// - Trival unswitching: this is when the condition can be unswitched without
+///   cloning any code from inside the loop. A non-trivial unswitch requires
+///   code duplication.
+///
+/// - Full unswitching: this is when the branch or switch is completely moved
+///   from inside the loop to outside the loop. Partial unswitching removes the
+///   branch from the clone of the loop but must leave a (somewhat simplified)
+///   branch in the original loop. While theoretically partial unswitching can
+///   be done for switches, the requirements are extreme - we need the loop
+///   invariant input to the switch to be sufficient to collapse to a single
+///   successor in each clone.
+///
+/// This pass always does trivial, full unswitching for both branches and
+/// switches. For branches, it also always does trivial, partial unswitching.
+///
+/// If enabled (via the constructor's `NonTrivial` parameter), this pass will
+/// additionally do non-trivial, full unswitching for branches and switches, and
+/// will do non-trivial, partial unswitching for branches.
+///
+/// Because partial unswitching of switches is extremely unlikely to be possible
+/// in practice and significantly complicates the implementation, this pass does
+/// not currently implement that in any mode.
 class SimpleLoopUnswitchPass : public PassInfoMixin<SimpleLoopUnswitchPass> {
   bool NonTrivial;
 

Modified: llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp?rev=335553&r1=335552&r2=335553&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (original)
+++ llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp Mon Jun 25 16:32:54 2018
@@ -715,8 +715,12 @@ static bool unswitchAllTrivialConditions
 ///
 /// This routine handles cloning all of the necessary loop blocks and exit
 /// blocks including rewriting their instructions and the relevant PHI nodes.
-/// It skips loop and exit blocks that are not necessary based on the provided
-/// set. It also correctly creates the unconditional branch in the cloned
+/// Any loop blocks or exit blocks which are dominated by a different successor
+/// than the one for this clone of the loop blocks can be trivially skipped. We
+/// use the `DominatingSucc` map to determine whether a block satisfies that
+/// property with a simple map lookup.
+///
+/// It also correctly creates the unconditional branch in the cloned
 /// unswitched parent block to only point at the unswitched successor.
 ///
 /// This does not handle most of the necessary updates to `LoopInfo`. Only exit
@@ -730,7 +734,7 @@ static BasicBlock *buildClonedLoopBlocks
     Loop &L, BasicBlock *LoopPH, BasicBlock *SplitBB,
     ArrayRef<BasicBlock *> ExitBlocks, BasicBlock *ParentBB,
     BasicBlock *UnswitchedSuccBB, BasicBlock *ContinueSuccBB,
-    const SmallPtrSetImpl<BasicBlock *> &SkippedLoopAndExitBlocks,
+    const SmallDenseMap<BasicBlock *, BasicBlock *, 16> &DominatingSucc,
     ValueToValueMapTy &VMap,
     SmallVectorImpl<DominatorTree::UpdateType> &DTUpdates, AssumptionCache &AC,
     DominatorTree &DT, LoopInfo &LI) {
@@ -751,19 +755,26 @@ static BasicBlock *buildClonedLoopBlocks
     return NewBB;
   };
 
+  // We skip cloning blocks when they have a dominating succ that is not the
+  // succ we are cloning for.
+  auto SkipBlock = [&](BasicBlock *BB) {
+    auto It = DominatingSucc.find(BB);
+    return It != DominatingSucc.end() && It->second != UnswitchedSuccBB;
+  };
+
   // First, clone the preheader.
   auto *ClonedPH = CloneBlock(LoopPH);
 
   // Then clone all the loop blocks, skipping the ones that aren't necessary.
   for (auto *LoopBB : L.blocks())
-    if (!SkippedLoopAndExitBlocks.count(LoopBB))
+    if (!SkipBlock(LoopBB))
       CloneBlock(LoopBB);
 
   // Split all the loop exit edges so that when we clone the exit blocks, if
   // any of the exit blocks are *also* a preheader for some other loop, we
   // don't create multiple predecessors entering the loop header.
   for (auto *ExitBB : ExitBlocks) {
-    if (SkippedLoopAndExitBlocks.count(ExitBB))
+    if (SkipBlock(ExitBB))
       continue;
 
     // When we are going to clone an exit, we don't need to clone all the
@@ -841,7 +852,7 @@ static BasicBlock *buildClonedLoopBlocks
   // Update any PHI nodes in the cloned successors of the skipped blocks to not
   // have spurious incoming values.
   for (auto *LoopBB : L.blocks())
-    if (SkippedLoopAndExitBlocks.count(LoopBB))
+    if (SkipBlock(LoopBB))
       for (auto *SuccBB : successors(LoopBB))
         if (auto *ClonedSuccBB = cast_or_null<BasicBlock>(VMap.lookup(SuccBB)))
           for (PHINode &PN : ClonedSuccBB->phis())
@@ -1175,10 +1186,41 @@ static void buildClonedLoops(Loop &OrigL
 }
 
 static void
+deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
+                       ArrayRef<std::unique_ptr<ValueToValueMapTy>> VMaps,
+                       DominatorTree &DT) {
+  // Find all the dead clones, and remove them from their successors.
+  SmallVector<BasicBlock *, 16> DeadBlocks;
+  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(), ExitBlocks))
+    for (auto &VMap : VMaps)
+      if (BasicBlock *ClonedBB = cast_or_null<BasicBlock>(VMap->lookup(BB)))
+        if (!DT.isReachableFromEntry(ClonedBB)) {
+          for (BasicBlock *SuccBB : successors(ClonedBB))
+            SuccBB->removePredecessor(ClonedBB);
+          DeadBlocks.push_back(ClonedBB);
+        }
+
+  // Drop any remaining references to break cycles.
+  for (BasicBlock *BB : DeadBlocks)
+    BB->dropAllReferences();
+  // Erase them from the IR.
+  for (BasicBlock *BB : DeadBlocks)
+    BB->eraseFromParent();
+}
+
+static void
 deleteDeadBlocksFromLoop(Loop &L,
-                         const SmallVectorImpl<BasicBlock *> &DeadBlocks,
                          SmallVectorImpl<BasicBlock *> &ExitBlocks,
                          DominatorTree &DT, LoopInfo &LI) {
+  // Find all the dead blocks, and remove them from their successors.
+  SmallVector<BasicBlock *, 16> DeadBlocks;
+  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(), ExitBlocks))
+    if (!DT.isReachableFromEntry(BB)) {
+      for (BasicBlock *SuccBB : successors(BB))
+        SuccBB->removePredecessor(BB);
+      DeadBlocks.push_back(BB);
+    }
+
   SmallPtrSet<BasicBlock *, 16> DeadBlockSet(DeadBlocks.begin(),
                                              DeadBlocks.end());
 
@@ -1187,11 +1229,6 @@ deleteDeadBlocksFromLoop(Loop &L,
   llvm::erase_if(ExitBlocks,
                  [&](BasicBlock *BB) { return DeadBlockSet.count(BB); });
 
-  // Remove these blocks from their successors.
-  for (auto *BB : DeadBlocks)
-    for (BasicBlock *SuccBB : successors(BB))
-      SuccBB->removePredecessor(BB, /*DontDeleteUselessPHIs*/ true);
-
   // Walk from this loop up through its parents removing all of the dead blocks.
   for (Loop *ParentL = &L; ParentL; ParentL = ParentL->getParentLoop()) {
     for (auto *BB : DeadBlocks)
@@ -1582,31 +1619,24 @@ void visitDomSubTree(DominatorTree &DT,
   } while (!DomWorklist.empty());
 }
 
-/// Take an invariant branch that has been determined to be safe and worthwhile
-/// to unswitch despite being non-trivial to do so and perform the unswitch.
-///
-/// This directly updates the CFG to hoist the predicate out of the loop, and
-/// clone the necessary parts of the loop to maintain behavior.
-///
-/// It also updates both dominator tree and loopinfo based on the unswitching.
-///
-/// Once unswitching has been performed it runs the provided callback to report
-/// the new loops and no-longer valid loops to the caller.
-static bool unswitchInvariantBranch(
-    Loop &L, BranchInst &BI, ArrayRef<Value *> Invariants, DominatorTree &DT,
-    LoopInfo &LI, AssumptionCache &AC,
+static bool unswitchNontrivialInvariants(
+    Loop &L, TerminatorInst &TI, ArrayRef<Value *> Invariants,
+    DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
     function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB) {
-  auto *ParentBB = BI.getParent();
-
-  // We can only unswitch conditional branches with an invariant condition or
-  // combining invariant conditions with an instruction.
-  assert(BI.isConditional() && "Can only unswitch a conditional branch!");
-  bool FullUnswitch = BI.getCondition() == Invariants[0];
+  auto *ParentBB = TI.getParent();
+  BranchInst *BI = dyn_cast<BranchInst>(&TI);
+  SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
+
+  // We can only unswitch switches, conditional branches with an invariant
+  // condition, or combining invariant conditions with an instruction.
+  assert((SI || BI->isConditional()) &&
+         "Can only unswitch switches and conditional branch!");
+  bool FullUnswitch = SI || BI->getCondition() == Invariants[0];
   if (FullUnswitch)
     assert(Invariants.size() == 1 &&
            "Cannot have other invariants with full unswitching!");
   else
-    assert(isa<Instruction>(BI.getCondition()) &&
+    assert(isa<Instruction>(BI->getCondition()) &&
            "Partial unswitching requires an instruction as the condition!");
 
   // Constant and BBs tracking the cloned and continuing successor. When we are
@@ -1618,18 +1648,27 @@ static bool unswitchInvariantBranch(
   bool Direction = true;
   int ClonedSucc = 0;
   if (!FullUnswitch) {
-    if (cast<Instruction>(BI.getCondition())->getOpcode() != Instruction::Or) {
-      assert(cast<Instruction>(BI.getCondition())->getOpcode() == Instruction::And &&
-        "Only `or` and `and` instructions can combine invariants being unswitched.");
+    if (cast<Instruction>(BI->getCondition())->getOpcode() != Instruction::Or) {
+      assert(cast<Instruction>(BI->getCondition())->getOpcode() ==
+                 Instruction::And &&
+             "Only `or` and `and` instructions can combine invariants being "
+             "unswitched.");
       Direction = false;
       ClonedSucc = 1;
     }
   }
-  auto *UnswitchedSuccBB = BI.getSuccessor(ClonedSucc);
-  auto *ContinueSuccBB = BI.getSuccessor(1 - ClonedSucc);
 
-  assert(UnswitchedSuccBB != ContinueSuccBB &&
-         "Should not unswitch a branch that always goes to the same place!");
+  BasicBlock *RetainedSuccBB =
+      BI ? BI->getSuccessor(1 - ClonedSucc) : SI->getDefaultDest();
+  SmallSetVector<BasicBlock *, 4> UnswitchedSuccBBs;
+  if (BI)
+    UnswitchedSuccBBs.insert(BI->getSuccessor(ClonedSucc));
+  else
+    for (auto Case : SI->cases())
+      UnswitchedSuccBBs.insert(Case.getCaseSuccessor());
+
+  assert(!UnswitchedSuccBBs.count(RetainedSuccBB) &&
+         "Should not unswitch the same successor we are retaining!");
 
   // The branch should be in this exact loop. Any inner loop's invariant branch
   // should be handled by unswitching that inner loop. The caller of this
@@ -1648,9 +1687,6 @@ static bool unswitchInvariantBranch(
     if (isa<CleanupPadInst>(ExitBB->getFirstNonPHI()))
       return false;
 
-  SmallPtrSet<BasicBlock *, 4> ExitBlockSet(ExitBlocks.begin(),
-                                            ExitBlocks.end());
-
   // Compute the parent loop now before we start hacking on things.
   Loop *ParentL = L.getParentLoop();
 
@@ -1669,30 +1705,22 @@ static bool unswitchInvariantBranch(
       OuterExitL = NewOuterExitL;
   }
 
-  // If the edge we *aren't* cloning in the unswitch (the continuing edge)
-  // dominates its target, we can skip cloning the dominated region of the loop
-  // and its exits. We compute this as a set of nodes to be skipped.
-  SmallPtrSet<BasicBlock *, 4> SkippedLoopAndExitBlocks;
-  if (ContinueSuccBB->getUniquePredecessor() ||
-      llvm::all_of(predecessors(ContinueSuccBB), [&](BasicBlock *PredBB) {
-        return PredBB == ParentBB || DT.dominates(ContinueSuccBB, PredBB);
-      })) {
-    visitDomSubTree(DT, ContinueSuccBB, [&](BasicBlock *BB) {
-      SkippedLoopAndExitBlocks.insert(BB);
-      return true;
-    });
-  }
-  // If we are doing full unswitching, then similarly to the above, the edge we
-  // *are* cloning in the unswitch (the unswitched edge) dominates its target,
-  // we will end up with dead nodes in the original loop and its exits that will
-  // need to be deleted. Here, we just retain that the property holds and will
-  // compute the deleted set later.
-  bool DeleteUnswitchedSucc =
-      FullUnswitch &&
-      (UnswitchedSuccBB->getUniquePredecessor() ||
-       llvm::all_of(predecessors(UnswitchedSuccBB), [&](BasicBlock *PredBB) {
-         return PredBB == ParentBB || DT.dominates(UnswitchedSuccBB, PredBB);
-       }));
+  // If the edge from this terminator to a successor dominates that successor,
+  // store a map from each block in its dominator subtree to it. This lets us
+  // tell when cloning for a particular successor if a block is dominated by
+  // some *other* successor with a single data structure. We use this to
+  // significantly reduce cloning.
+  SmallDenseMap<BasicBlock *, BasicBlock *, 16> DominatingSucc;
+  for (auto *SuccBB : llvm::concat<BasicBlock *const>(
+           makeArrayRef(RetainedSuccBB), UnswitchedSuccBBs))
+    if (SuccBB->getUniquePredecessor() ||
+        llvm::all_of(predecessors(SuccBB), [&](BasicBlock *PredBB) {
+          return PredBB == ParentBB || DT.dominates(SuccBB, PredBB);
+        }))
+      visitDomSubTree(DT, SuccBB, [&](BasicBlock *BB) {
+        DominatingSucc[BB] = SuccBB;
+        return true;
+      });
 
   // Split the preheader, so that we know that there is a safe place to insert
   // the conditional branch. We will change the preheader to have a conditional
@@ -1702,84 +1730,93 @@ static bool unswitchInvariantBranch(
   BasicBlock *SplitBB = L.getLoopPreheader();
   BasicBlock *LoopPH = SplitEdge(SplitBB, L.getHeader(), &DT, &LI);
 
-  // Keep a mapping for the cloned values.
-  ValueToValueMapTy VMap;
-
   // Keep track of the dominator tree updates needed.
   SmallVector<DominatorTree::UpdateType, 4> DTUpdates;
 
-  // Build the cloned blocks from the loop.
-  auto *ClonedPH = buildClonedLoopBlocks(
-      L, LoopPH, SplitBB, ExitBlocks, ParentBB, UnswitchedSuccBB,
-      ContinueSuccBB, SkippedLoopAndExitBlocks, VMap, DTUpdates, AC, DT, LI);
+  // Clone the loop for each unswitched successor.
+  SmallVector<std::unique_ptr<ValueToValueMapTy>, 4> VMaps;
+  VMaps.reserve(UnswitchedSuccBBs.size());
+  SmallDenseMap<BasicBlock *, BasicBlock *, 4> ClonedPHs;
+  for (auto *SuccBB : UnswitchedSuccBBs) {
+    VMaps.emplace_back(new ValueToValueMapTy());
+    ClonedPHs[SuccBB] = buildClonedLoopBlocks(
+        L, LoopPH, SplitBB, ExitBlocks, ParentBB, SuccBB, RetainedSuccBB,
+        DominatingSucc, *VMaps.back(), DTUpdates, AC, DT, LI);
+  }
 
   // The stitching of the branched code back together depends on whether we're
   // doing full unswitching or not with the exception that we always want to
   // nuke the initial terminator placed in the split block.
   SplitBB->getTerminator()->eraseFromParent();
   if (FullUnswitch) {
-    // Remove the parent as a predecessor of the
-    // unswitched successor.
-    UnswitchedSuccBB->removePredecessor(ParentBB,
-                                        /*DontDeleteUselessPHIs*/ true);
-    DTUpdates.push_back({DominatorTree::Delete, ParentBB, UnswitchedSuccBB});
-
-    // Now splice the branch from the original loop and use it to select between
-    // the two loops.
-    SplitBB->getInstList().splice(SplitBB->end(), ParentBB->getInstList(), BI);
-    BI.setSuccessor(ClonedSucc, ClonedPH);
-    BI.setSuccessor(1 - ClonedSucc, LoopPH);
+    for (BasicBlock *SuccBB : UnswitchedSuccBBs) {
+      // Remove the parent as a predecessor of the unswitched successor.
+      SuccBB->removePredecessor(ParentBB,
+                                /*DontDeleteUselessPHIs*/ true);
+      DTUpdates.push_back({DominatorTree::Delete, ParentBB, SuccBB});
+    }
+
+    // Now splice the terminator from the original loop and rewrite its
+    // successors.
+    SplitBB->getInstList().splice(SplitBB->end(), ParentBB->getInstList(), TI);
+    if (BI) {
+      assert(UnswitchedSuccBBs.size() == 1 &&
+             "Only one possible unswitched block for a branch!");
+      BasicBlock *ClonedPH = ClonedPHs.begin()->second;
+      BI->setSuccessor(ClonedSucc, ClonedPH);
+      BI->setSuccessor(1 - ClonedSucc, LoopPH);
+      DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
+    } else {
+      assert(SI && "Must either be a branch or switch!");
+
+      // Walk the cases and directly update their successors.
+      for (auto &Case : SI->cases())
+        Case.setSuccessor(ClonedPHs.find(Case.getCaseSuccessor())->second);
+      // We need to use the set to populate domtree updates as even when there
+      // are multiple cases pointing at the same successor we only want to
+      // insert one edge in the domtree.
+      for (BasicBlock *SuccBB : UnswitchedSuccBBs)
+        DTUpdates.push_back(
+            {DominatorTree::Insert, SplitBB, ClonedPHs.find(SuccBB)->second});
+
+      SI->setDefaultDest(LoopPH);
+    }
 
     // Create a new unconditional branch to the continuing block (as opposed to
     // the one cloned).
-    BranchInst::Create(ContinueSuccBB, ParentBB);
+    BranchInst::Create(RetainedSuccBB, ParentBB);
   } else {
+    assert(BI && "Only branches have partial unswitching.");
+    assert(UnswitchedSuccBBs.size() == 1 &&
+           "Only one possible unswitched block for a branch!");
+    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
     // When doing a partial unswitch, we have to do a bit more work to build up
     // the branch in the split block.
     buildPartialUnswitchConditionalBranch(*SplitBB, Invariants, Direction,
                                           *ClonedPH, *LoopPH);
+    DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
   }
 
-  // Before we update the dominator tree, collect the dead blocks if we're going
-  // to end up deleting the unswitched successor.
-  SmallVector<BasicBlock *, 16> DeadBlocks;
-  if (DeleteUnswitchedSucc) {
-    DeadBlocks.push_back(UnswitchedSuccBB);
-    for (int i = 0; i < (int)DeadBlocks.size(); ++i) {
-      // If we reach an exit block, stop recursing as the unswitched loop will
-      // end up reaching the merge block which we make the successor of the
-      // exit.
-      if (ExitBlockSet.count(DeadBlocks[i]))
-        continue;
-
-      // Insert the children that are within the loop or exit block set. Other
-      // children may reach out of the loop. While we don't expect these to be
-      // dead (as the unswitched clone should reach them) we don't try to prove
-      // that here.
-      for (DomTreeNode *ChildN : *DT[DeadBlocks[i]])
-        if (L.contains(ChildN->getBlock()) ||
-            ExitBlockSet.count(ChildN->getBlock()))
-          DeadBlocks.push_back(ChildN->getBlock());
-    }
-  }
-
-  // Add the remaining edge to our updates and apply them to get an up-to-date
-  // dominator tree. Note that this will cause the dead blocks above to be
-  // unreachable and no longer in the dominator tree.
-  DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
+  // Apply the updates accumulated above to get an up-to-date dominator tree.
   DT.applyUpdates(DTUpdates);
 
+  // Now that we have an accurate dominator tree, first delete the dead cloned
+  // blocks so that we can accurately build any cloned loops. It is important to
+  // not delete the blocks from the original loop yet because we still want to
+  // reference the original loop to understand the cloned loop's structure.
+  deleteDeadClonedBlocks(L, ExitBlocks, VMaps, DT);
+
   // Build the cloned loop structure itself. This may be substantially
   // different from the original structure due to the simplified CFG. This also
   // handles inserting all the cloned blocks into the correct loops.
   SmallVector<Loop *, 4> NonChildClonedLoops;
-  buildClonedLoops(L, ExitBlocks, VMap, LI, NonChildClonedLoops);
-
-  // Delete anything that was made dead in the original loop due to
-  // unswitching.
-  if (!DeadBlocks.empty())
-    deleteDeadBlocksFromLoop(L, DeadBlocks, ExitBlocks, DT, LI);
+  for (std::unique_ptr<ValueToValueMapTy> &VMap : VMaps)
+    buildClonedLoops(L, ExitBlocks, *VMap, LI, NonChildClonedLoops);
 
+  // Now that our cloned loops have been built, we can update the original loop.
+  // First we delete the dead blocks from it and then we rebuild the loop
+  // structure taking these deletions into account.
+  deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI);
   SmallVector<Loop *, 4> HoistedLoops;
   bool IsStillLoop = rebuildLoopAfterUnswitch(L, ExitBlocks, LI, HoistedLoops);
 
@@ -1790,31 +1827,37 @@ static bool unswitchInvariantBranch(
   // verification steps.
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
 
-  // Now we want to replace all the uses of the invariants within both the
-  // original and cloned blocks. We do this here so that we can use the now
-  // updated dominator tree to identify which side the users are on.
-  ConstantInt *UnswitchedReplacement =
-      Direction ? ConstantInt::getTrue(BI.getContext())
-                : ConstantInt::getFalse(BI.getContext());
-  ConstantInt *ContinueReplacement =
-      Direction ? ConstantInt::getFalse(BI.getContext())
-                : ConstantInt::getTrue(BI.getContext());
-  for (Value *Invariant : Invariants)
-    for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
-         UI != UE;) {
-      // Grab the use and walk past it so we can clobber it in the use list.
-      Use *U = &*UI++;
-      Instruction *UserI = dyn_cast<Instruction>(U->getUser());
-      if (!UserI)
-        continue;
+  if (BI) {
+    // If we unswitched a branch which collapses the condition to a known
+    // constant we want to replace all the uses of the invariants within both
+    // the original and cloned blocks. We do this here so that we can use the
+    // now updated dominator tree to identify which side the users are on.
+    assert(UnswitchedSuccBBs.size() == 1 &&
+           "Only one possible unswitched block for a branch!");
+    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
+    ConstantInt *UnswitchedReplacement =
+        Direction ? ConstantInt::getTrue(BI->getContext())
+                  : ConstantInt::getFalse(BI->getContext());
+    ConstantInt *ContinueReplacement =
+        Direction ? ConstantInt::getFalse(BI->getContext())
+                  : ConstantInt::getTrue(BI->getContext());
+    for (Value *Invariant : Invariants)
+      for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
+           UI != UE;) {
+        // Grab the use and walk past it so we can clobber it in the use list.
+        Use *U = &*UI++;
+        Instruction *UserI = dyn_cast<Instruction>(U->getUser());
+        if (!UserI)
+          continue;
 
-      // Replace it with the 'continue' side if in the main loop body, and the
-      // unswitched if in the cloned blocks.
-      if (DT.dominates(LoopPH, UserI->getParent()))
-        U->set(ContinueReplacement);
-      else if (DT.dominates(ClonedPH, UserI->getParent()))
-        U->set(UnswitchedReplacement);
-    }
+        // Replace it with the 'continue' side if in the main loop body, and the
+        // unswitched if in the cloned blocks.
+        if (DT.dominates(LoopPH, UserI->getParent()))
+          U->set(ContinueReplacement);
+        else if (DT.dominates(ClonedPH, UserI->getParent()))
+          U->set(UnswitchedReplacement);
+      }
+  }
 
   // We can change which blocks are exit blocks of all the cloned sibling
   // loops, the current loop, and any parent loops which shared exit blocks
@@ -1937,8 +1980,16 @@ static bool unswitchBestCondition(
     if (LI.getLoopFor(BB) != &L)
       continue;
 
+    if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
+      // We can only consider fully loop-invariant switch conditions as we need
+      // to completely eliminate the switch after unswitching.
+      if (!isa<Constant>(SI->getCondition()) &&
+          L.isLoopInvariant(SI->getCondition()))
+        UnswitchCandidates.push_back({SI, {SI->getCondition()}});
+      continue;
+    }
+
     auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
-    // FIXME: Handle switches here!
     if (!BI || !BI->isConditional() || isa<Constant>(BI->getCondition()) ||
         BI->getSuccessor(0) == BI->getSuccessor(1))
       continue;
@@ -2091,9 +2142,9 @@ static bool unswitchBestCondition(
     TerminatorInst &TI = *TerminatorAndInvariants.first;
     ArrayRef<Value *> Invariants = TerminatorAndInvariants.second;
     BranchInst *BI = dyn_cast<BranchInst>(&TI);
-    int CandidateCost =
-        ComputeUnswitchedCost(TI, /*FullUnswitch*/ Invariants.size() == 1 && BI &&
-                                      Invariants[0] == BI->getCondition());
+    int CandidateCost = ComputeUnswitchedCost(
+        TI, /*FullUnswitch*/ !BI || (Invariants.size() == 1 &&
+                                     Invariants[0] == BI->getCondition()));
     LLVM_DEBUG(dbgs() << "  Computed cost of " << CandidateCost
                       << " for unswitch candidate: " << TI << "\n");
     if (!BestUnswitchTI || CandidateCost < BestUnswitchCost) {
@@ -2109,17 +2160,11 @@ static bool unswitchBestCondition(
     return false;
   }
 
-  auto *UnswitchBI = dyn_cast<BranchInst>(BestUnswitchTI);
-  if (!UnswitchBI) {
-    // FIXME: Add support for unswitching a switch here!
-    LLVM_DEBUG(dbgs() << "Cannot unswitch anything but a branch!\n");
-    return false;
-  }
-
   LLVM_DEBUG(dbgs() << "  Trying to unswitch non-trivial (cost = "
-                    << BestUnswitchCost << ") branch: " << *UnswitchBI << "\n");
-  return unswitchInvariantBranch(L, *UnswitchBI, BestUnswitchInvariants, DT, LI,
-                                 AC, UnswitchCB);
+                    << BestUnswitchCost << ") terminator: " << *BestUnswitchTI
+                    << "\n");
+  return unswitchNontrivialInvariants(
+      L, *BestUnswitchTI, BestUnswitchInvariants, DT, LI, AC, UnswitchCB);
 }
 
 /// Unswitch control flow predicated on loop invariant conditions.

Modified: llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll?rev=335553&r1=335552&r2=335553&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll (original)
+++ llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll Mon Jun 25 16:32:54 2018
@@ -387,7 +387,7 @@ loop_begin:
 loop_b:
   %b = load i32, i32* %b.ptr
   br i1 %v, label %loop_begin, label %loop_exit
-; The 'loop_b' unswitched loop.
+; The original loop, now non-looping due to unswitching..
 ;
 ; CHECK:       entry.split:
 ; CHECK-NEXT:    br label %loop_begin
@@ -398,14 +398,13 @@ loop_b:
 ; CHECK-NEXT:    br label %loop_exit.split
 ;
 ; CHECK:       loop_exit.split:
-; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A]], %loop_begin ]
 ; CHECK-NEXT:    br label %loop_exit
 
 loop_exit:
   %ab.phi = phi i32 [ %b, %loop_b ], [ %a, %loop_begin ]
   ret i32 %ab.phi
 ; CHECK:       loop_exit:
-; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit.split ], [ %[[B_LCSSA]], %loop_exit.split.us ]
+; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A]], %loop_exit.split ], [ %[[B_LCSSA]], %loop_exit.split.us ]
 ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
 }
 
@@ -458,8 +457,7 @@ loop_exit1:
   call void @sink1(i32 %a.phi)
   ret void
 ; CHECK:       loop_exit1:
-; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit1.split.us ]
-; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
+; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
 ; CHECK-NEXT:    ret void
 
 loop_exit2:
@@ -467,8 +465,8 @@ loop_exit2:
   call void @sink2(i32 %b.phi)
   ret void
 ; CHECK:       loop_exit2:
-; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
-; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
+; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
+; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
 ; CHECK-NEXT:    ret void
 }
 
@@ -531,8 +529,7 @@ loop_exit2:
   call void @sink2(i32 %b.phi)
   ret void
 ; CHECK:       loop_exit2:
-; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %loop_exit2.split.us ]
-; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
+; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
 ; CHECK-NEXT:    ret void
 }
 
@@ -587,8 +584,7 @@ loop_exit1:
   call void @sink1(i32 %a.phi)
   br label %exit
 ; CHECK:       loop_exit1:
-; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit1.split.us ]
-; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
+; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
 ; CHECK-NEXT:    br label %exit
 
 loop_exit2:
@@ -596,8 +592,8 @@ loop_exit2:
   call void @sink2(i32 %b.phi)
   br label %exit
 ; CHECK:       loop_exit2:
-; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
-; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
+; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
+; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
 ; CHECK-NEXT:    br label %exit
 
 exit:
@@ -663,7 +659,7 @@ loop_latch:
   %v2 = load i1, i1* %ptr
   br i1 %v2, label %loop_begin, label %loop_exit
 ; CHECK:       loop_latch:
-; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %inner_loop_b ]
+; CHECK-NEXT:    %[[B_INNER_LCSSA:.*]] = phi i32 [ %[[B]], %inner_loop_b ]
 ; CHECK-NEXT:    %[[V2:.*]] = load i1, i1* %ptr
 ; CHECK-NEXT:    br i1 %[[V2]], label %loop_begin, label %loop_exit.loopexit1
 
@@ -671,15 +667,14 @@ loop_exit:
   %ab.phi = phi i32 [ %a, %inner_loop_begin ], [ %b.phi, %loop_latch ]
   ret i32 %ab.phi
 ; CHECK:       loop_exit.loopexit:
-; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit.loopexit.split.us ]
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit.loopexit1:
-; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %loop_latch ]
+; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]], %loop_latch ]
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit:
-; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_PHI]], %loop_exit.loopexit ], [ %[[B_PHI]], %loop_exit.loopexit1 ]
+; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit.loopexit ], [ %[[B_LCSSA]], %loop_exit.loopexit1 ]
 ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
 }
 
@@ -773,11 +768,10 @@ latch:
 ; CHECK-NEXT:    br label %latch
 ;
 ; CHECK:       latch:
-; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_INNER_LCSSA]], %loop_b_inner_exit ]
 ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit.split
 ;
 ; CHECK:       loop_exit.split:
-; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_PHI]], %latch ]
+; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]], %latch ]
 ; CHECK-NEXT:    br label %loop_exit
 
 loop_exit:
@@ -1466,7 +1460,6 @@ inner_loop_exit:
   %v = load i1, i1* %ptr
   br i1 %v, label %loop_begin, label %loop_exit
 ; CHECK:       inner_loop_exit:
-; CHECK-NEXT:    %[[A_INNER_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %inner_loop_exit.split.us ]
 ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
 ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
 
@@ -1474,7 +1467,7 @@ loop_exit:
   %a.lcssa = phi i32 [ %a.inner_lcssa, %inner_loop_exit ]
   ret i32 %a.lcssa
 ; CHECK:       loop_exit:
-; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA]], %inner_loop_exit ]
+; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %inner_loop_exit ]
 ; CHECK-NEXT:    ret i32 %[[A_LCSSA]]
 }
 
@@ -1555,7 +1548,7 @@ loop_exit:
   ret i32 %a.lcssa
 ; CHECK:       loop_exit:
 ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %loop_exit.split ], [ %[[A_PHI_US]], %loop_exit.split.us ]
-; CHECK-NEXT:    ret i32 %[[AB_PHI]]
+; CHECK-NEXT:    ret i32 %[[A_PHI]]
 }
 
 ; Test that requires re-forming dedicated exits for the original loop.
@@ -1635,7 +1628,7 @@ loop_exit:
   ret i32 %a.lcssa
 ; CHECK:       loop_exit:
 ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_PHI_SPLIT]], %loop_exit.split ], [ %[[A_LCSSA_US]], %loop_exit.split.us ]
-; CHECK-NEXT:    ret i32 %[[AB_PHI]]
+; CHECK-NEXT:    ret i32 %[[A_PHI]]
 }
 
 ; Check that if a cloned inner loop after unswitching doesn't loop and directly
@@ -1721,7 +1714,6 @@ loop_exit:
   %a.lcssa = phi i32 [ %a, %inner_loop_begin ], [ %a.inner_lcssa, %inner_loop_exit ]
   ret i32 %a.lcssa
 ; CHECK:       loop_exit.loopexit:
-; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %loop_exit.loopexit.split.us ]
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit.loopexit1:
@@ -1729,7 +1721,7 @@ loop_exit:
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit:
-; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA_US]], %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
+; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
 ; CHECK-NEXT:    ret i32 %[[A_PHI]]
 }
 
@@ -1802,7 +1794,6 @@ inner_loop_exit:
   %v3 = load i1, i1* %ptr
   br i1 %v3, label %loop_latch, label %loop_exit
 ; CHECK:       inner_loop_exit:
-; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %inner_loop_exit.split.us ]
 ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
 ; CHECK-NEXT:    br i1 %[[V]], label %loop_latch, label %loop_exit.loopexit1
 
@@ -1819,7 +1810,7 @@ loop_exit:
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit.loopexit1:
-; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_PHI]], %inner_loop_exit ]
+; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %inner_loop_exit ]
 ; CHECK-NEXT:    br label %loop_exit
 ;
 ; CHECK:       loop_exit:
@@ -1916,7 +1907,6 @@ inner_loop_exit:
   %v4 = load i1, i1* %ptr
   br i1 %v4, label %loop_begin, label %loop_exit
 ; CHECK:       inner_loop_exit.loopexit:
-; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit.split.us ]
 ; CHECK-NEXT:    br label %inner_loop_exit
 ;
 ; CHECK:       inner_loop_exit.loopexit1:
@@ -1924,7 +1914,7 @@ inner_loop_exit:
 ; CHECK-NEXT:    br label %inner_loop_exit
 ;
 ; CHECK:       inner_loop_exit:
-; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %inner_loop_exit.loopexit ], [ %[[A_INNER_LCSSA]], %inner_loop_exit.loopexit1 ]
+; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit ], [ %[[A_INNER_LCSSA]], %inner_loop_exit.loopexit1 ]
 ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
 ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
 
@@ -2010,7 +2000,6 @@ inner_inner_loop_exit:
   %v3 = load i1, i1* %ptr
   br i1 %v3, label %inner_loop_latch, label %inner_loop_exit
 ; CHECK:       inner_inner_loop_exit:
-; CHECK-NEXT:    %[[A_INNER_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit.split.us ]
 ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
 ; CHECK-NEXT:    br i1 %[[V]], label %inner_loop_latch, label %inner_loop_exit.loopexit1
 
@@ -2028,7 +2017,7 @@ inner_loop_exit:
 ; CHECK-NEXT:    br label %inner_loop_exit
 ;
 ; CHECK:       inner_loop_exit.loopexit1:
-; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_INNER_PHI]], %inner_inner_loop_exit ]
+; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit ]
 ; CHECK-NEXT:    br label %inner_loop_exit
 ;
 ; CHECK:       inner_loop_exit:
@@ -2296,56 +2285,96 @@ define i32 @test20(i32* %var, i32 %cond1
 entry:
   br label %loop_begin
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label %loop_begin
+; CHECK-NEXT:    switch i32 %cond2, label %[[ENTRY_SPLIT_EXIT:.*]] [
+; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
+; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_A]]
+; CHECK-NEXT:      i32 13, label %[[ENTRY_SPLIT_B:.*]]
+; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_A]]
+; CHECK-NEXT:      i32 42, label %[[ENTRY_SPLIT_C:.*]]
+; CHECK-NEXT:    ]
 
 loop_begin:
   %var_val = load i32, i32* %var
-  switch i32 %cond2, label %loop_a [
-    i32 0, label %loop_b
-    i32 1, label %loop_b
-    i32 13, label %loop_c
-    i32 2, label %loop_b
-    i32 42, label %loop_exit
+  switch i32 %cond2, label %loop_exit [
+    i32 0, label %loop_a
+    i32 1, label %loop_a
+    i32 13, label %loop_b
+    i32 2, label %loop_a
+    i32 42, label %loop_c
   ]
-; CHECK:       loop_begin:
-; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
-; CHECK-NEXT:    switch i32 %cond2, label %loop_a [
-; CHECK-NEXT:      i32 0, label %loop_b
-; CHECK-NEXT:      i32 1, label %loop_b
-; CHECK-NEXT:      i32 13, label %loop_c
-; CHECK-NEXT:      i32 2, label %loop_b
-; CHECK-NEXT:      i32 42, label %loop_exit
-; CHECK-NEXT:    ]
 
 loop_a:
   call void @a()
   br label %loop_latch
-; CHECK:       loop_a:
+; Unswitched 'a' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_A]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_A]]:
+; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
+; CHECK-NEXT:    br label %[[LOOP_A:.*]]
+;
+; CHECK:       [[LOOP_A]]:
 ; CHECK-NEXT:    call void @a()
-; CHECK-NEXT:    br label %loop_latch
+; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
+;
+; CHECK:       [[LOOP_LATCH_A]]:
+; CHECK:         br label %[[LOOP_BEGIN_A]]
 
 loop_b:
   call void @b()
   br label %loop_latch
-; CHECK:       loop_b:
+; Unswitched 'b' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_B]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_B]]:
+; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
+; CHECK-NEXT:    br label %[[LOOP_B:.*]]
+;
+; CHECK:       [[LOOP_B]]:
 ; CHECK-NEXT:    call void @b()
-; CHECK-NEXT:    br label %loop_latch
+; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
+;
+; CHECK:       [[LOOP_LATCH_B]]:
+; CHECK:         br label %[[LOOP_BEGIN_B]]
 
 loop_c:
   call void @c() noreturn nounwind
   br label %loop_latch
-; CHECK:       loop_c:
+; Unswitched 'c' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_C]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_C]]:
+; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
+; CHECK-NEXT:    br label %[[LOOP_C:.*]]
+;
+; CHECK:       [[LOOP_C]]:
 ; CHECK-NEXT:    call void @c()
-; CHECK-NEXT:    br label %loop_latch
+; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
+;
+; CHECK:       [[LOOP_LATCH_C]]:
+; CHECK:         br label %[[LOOP_BEGIN_C]]
 
 loop_latch:
   br label %loop_begin
-; CHECK:       loop_latch:
-; CHECK-NEXT:    br label %loop_begin
 
 loop_exit:
   %lcssa = phi i32 [ %var_val, %loop_begin ]
   ret i32 %lcssa
+; Unswitched exit edge (no longer a loop).
+;
+; CHECK:       [[ENTRY_SPLIT_EXIT]]:
+; CHECK-NEXT:    br label %loop_begin
+;
+; CHECK:       loop_begin:
+; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
+; CHECK-NEXT:    br label %loop_exit
+;
 ; CHECK:       loop_exit:
 ; CHECK-NEXT:    %[[LCSSA:.*]] = phi i32 [ %[[V]], %loop_begin ]
 ; CHECK-NEXT:    ret i32 %[[LCSSA]]
@@ -2824,3 +2853,112 @@ loop_exit:
 ; CHECK:       loop_exit:
 ; CHECK-NEXT:    ret
 }
+
+; Non-trivial unswitching of a switch.
+define i32 @test27(i1* %ptr, i32 %cond) {
+; CHECK-LABEL: @test27(
+entry:
+  br label %loop_begin
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    switch i32 %cond, label %[[ENTRY_SPLIT_LATCH:.*]] [
+; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
+; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_B:.*]]
+; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_C:.*]]
+; CHECK-NEXT:    ]
+
+loop_begin:
+  switch i32 %cond, label %latch [
+    i32 0, label %loop_a
+    i32 1, label %loop_b
+    i32 2, label %loop_c
+  ]
+
+loop_a:
+  call void @a()
+  br label %latch
+; Unswitched 'a' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_A]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_A]]:
+; CHECK-NEXT:    br label %[[LOOP_A:.*]]
+;
+; CHECK:       [[LOOP_A]]:
+; CHECK-NEXT:    call void @a()
+; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
+;
+; CHECK:       [[LOOP_LATCH_A]]:
+; CHECK-NEXT:    %[[V_A:.*]] = load i1, i1* %ptr
+; CHECK:         br i1 %[[V_A]], label %[[LOOP_BEGIN_A]], label %[[LOOP_EXIT_A:.*]]
+;
+; CHECK:       [[LOOP_EXIT_A]]:
+; CHECK-NEXT:    br label %loop_exit
+
+loop_b:
+  call void @b()
+  br label %latch
+; Unswitched 'b' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_B]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_B]]:
+; CHECK-NEXT:    br label %[[LOOP_B:.*]]
+;
+; CHECK:       [[LOOP_B]]:
+; CHECK-NEXT:    call void @b()
+; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
+;
+; CHECK:       [[LOOP_LATCH_B]]:
+; CHECK-NEXT:    %[[V_B:.*]] = load i1, i1* %ptr
+; CHECK:         br i1 %[[V_B]], label %[[LOOP_BEGIN_B]], label %[[LOOP_EXIT_B:.*]]
+;
+; CHECK:       [[LOOP_EXIT_B]]:
+; CHECK-NEXT:    br label %loop_exit
+
+loop_c:
+  call void @c()
+  br label %latch
+; Unswitched 'c' loop.
+;
+; CHECK:       [[ENTRY_SPLIT_C]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_C]]:
+; CHECK-NEXT:    br label %[[LOOP_C:.*]]
+;
+; CHECK:       [[LOOP_C]]:
+; CHECK-NEXT:    call void @c()
+; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
+;
+; CHECK:       [[LOOP_LATCH_C]]:
+; CHECK-NEXT:    %[[V_C:.*]] = load i1, i1* %ptr
+; CHECK:         br i1 %[[V_C]], label %[[LOOP_BEGIN_C]], label %[[LOOP_EXIT_C:.*]]
+;
+; CHECK:       [[LOOP_EXIT_C]]:
+; CHECK-NEXT:    br label %loop_exit
+
+latch:
+  %v = load i1, i1* %ptr
+  br i1 %v, label %loop_begin, label %loop_exit
+; Unswitched the 'latch' only loop.
+;
+; CHECK:       [[ENTRY_SPLIT_LATCH]]:
+; CHECK-NEXT:    br label %[[LOOP_BEGIN_LATCH:.*]]
+;
+; CHECK:       [[LOOP_BEGIN_LATCH]]:
+; CHECK-NEXT:    br label %[[LOOP_LATCH_LATCH:.*]]
+;
+; CHECK:       [[LOOP_LATCH_LATCH]]:
+; CHECK-NEXT:    %[[V_LATCH:.*]] = load i1, i1* %ptr
+; CHECK:         br i1 %[[V_LATCH]], label %[[LOOP_BEGIN_LATCH]], label %[[LOOP_EXIT_LATCH:.*]]
+;
+; CHECK:       [[LOOP_EXIT_LATCH]]:
+; CHECK-NEXT:    br label %loop_exit
+
+loop_exit:
+  ret i32 0
+; CHECK:       loop_exit:
+; CHECK-NEXT:    ret i32 0
+}
\ No newline at end of file