[llvm] r335553 - [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial

Mon Jul 9 02:51:37 PDT 2018

Also +Alina Sbirlea <asbirlea at google.com> as she's been hacking on top of
this...

On Mon, Jul 9, 2018 at 2:51 AM Chandler Carruth <chandlerc at google.com>
wrote:

> FYI for folks testing out this functionality, there are collection of
> serious bugs in this commit. I've got a fix and just need to add some
> testing. Will have it landed tomorrow. Just wanted to send a heads up.
>
> +Mikael Holmén <mikael.holmen at ericsson.com> +Fedor Sergeev
> <fedor.sergeev at azul.com>
>
> On Mon, Jun 25, 2018 at 4:37 PM Chandler Carruth via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: chandlerc
>> Date: Mon Jun 25 16:32:54 2018
>> New Revision: 335553
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=335553&view=rev
>> Log:
>> [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
>> unswitching of switches.
>>
>> This works much like trivial unswitching of switches in that it reliably
>> moves the switch out of the loop. Here we potentially clone the entire
>> loop into each successor of the switch and re-point the cases at these
>> clones.
>>
>> Due to the complexity of actually doing nontrivial unswitching, this
>> patch doesn't create a dedicated routine for handling switches -- it
>> would duplicate far too much code. Instead, it generalizes the existing
>> routine to handle both branches and switches as it largely reduces to
>> looping in a few places instead of doing something once. This actually
>> improves the results in some cases with branches due to being much more
>> careful about how dead regions of code are managed. With branches,
>> because exactly one clone is created and there are exactly two edges
>> considered, somewhat sloppy handling of the dead regions of code was
>> sufficient in most cases. But with switches, there are much more
>> complicated patterns of dead code and so I've had to move to a more
>> robust model generally. We still do as much pruning of the dead code
>> early as possible because that allows us to avoid even cloning the code.
>>
>> This also surfaced another problem with nontrivial unswitching before
>> which is that we weren't as precise in reconstructing loops as we could
>> have been. This seems to have been mostly harmless, but resulted in
>> pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
>> have to get this *right*, and everything benefits from it.
>>
>> While the testing may seem a bit light here because we only have two
>> real cases with actual switches, they do a surprisingly good job of
>> exercising numerous edge cases. Also, because we share the logic with
>> branches, most of the changes in this patch are reasonably well covered
>> by existing tests.
>>
>> The new unswitch now has all of the same fundamental power as the old
>> one with the exception of the single unsound case of *partial* switch
>> unswitching -- that really is just loop specialization and not
>> unswitching at all. It doesn't fit into the canonicalization model in
>> any way. We can add a loop specialization pass that runs late based on
>> profile data if important test cases ever come up here.
>>
>> Differential Revision: https://reviews.llvm.org/D47683
>>
>> Modified:
>>     llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>>     llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
>>     llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>>
>> Modified: llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h?rev=335553&r1=335552&r2=335553&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>> (original)
>> +++ llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h Mon
>> Jun 25 16:32:54 2018
>> @@ -17,9 +17,9 @@
>>
>>  namespace llvm {
>>
>> -/// This pass transforms loops that contain branches on loop-invariant
>> -/// conditions to have multiple loops. For example, it turns the left
>> into the
>> -/// right code:
>> +/// This pass transforms loops that contain branches or switches on loop-
>> +/// invariant conditions to have multiple loops. For example, it turns
>> the left
>> +/// into the right code:
>>  ///
>>  ///  for (...)                  if (lic)
>>  ///    A                          for (...)
>> @@ -35,6 +35,31 @@ namespace llvm {
>>  /// This pass expects LICM to be run before it to hoist invariant
>> conditions out
>>  /// of the loop, to make the unswitching opportunity obvious.
>>  ///
>> +/// There is a taxonomy of unswitching that we use to classify different
>> forms
>> +/// of this transformaiton:
>> +///
>> +/// - Trival unswitching: this is when the condition can be unswitched
>> without
>> +///   cloning any code from inside the loop. A non-trivial unswitch
>> requires
>> +///   code duplication.
>> +///
>> +/// - Full unswitching: this is when the branch or switch is completely
>> moved
>> +///   from inside the loop to outside the loop. Partial unswitching
>> removes the
>> +///   branch from the clone of the loop but must leave a (somewhat
>> simplified)
>> +///   branch in the original loop. While theoretically partial
>> unswitching can
>> +///   be done for switches, the requirements are extreme - we need the
>> loop
>> +///   invariant input to the switch to be sufficient to collapse to a
>> single
>> +///   successor in each clone.
>> +///
>> +/// This pass always does trivial, full unswitching for both branches and
>> +/// switches. For branches, it also always does trivial, partial
>> unswitching.
>> +///
>> +/// If enabled (via the constructor's `NonTrivial` parameter), this pass
>> will
>> +/// additionally do non-trivial, full unswitching for branches and
>> switches, and
>> +/// will do non-trivial, partial unswitching for branches.
>> +///
>> +/// Because partial unswitching of switches is extremely unlikely to be
>> possible
>> +/// in practice and significantly complicates the implementation, this
>> pass does
>> +/// not currently implement that in any mode.
>>  class SimpleLoopUnswitchPass : public
>> PassInfoMixin<SimpleLoopUnswitchPass> {
>>    bool NonTrivial;
>>
>>
>> Modified: llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp?rev=335553&r1=335552&r2=335553&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp Mon Jun 25
>> 16:32:54 2018
>> @@ -715,8 +715,12 @@ static bool unswitchAllTrivialConditions
>>  ///
>>  /// This routine handles cloning all of the necessary loop blocks and
>> exit
>>  /// blocks including rewriting their instructions and the relevant PHI
>> nodes.
>> -/// It skips loop and exit blocks that are not necessary based on the
>> provided
>> -/// set. It also correctly creates the unconditional branch in the cloned
>> +/// Any loop blocks or exit blocks which are dominated by a different
>> successor
>> +/// than the one for this clone of the loop blocks can be trivially
>> skipped. We
>> +/// use the `DominatingSucc` map to determine whether a block satisfies
>> that
>> +/// property with a simple map lookup.
>> +///
>> +/// It also correctly creates the unconditional branch in the cloned
>>  /// unswitched parent block to only point at the unswitched successor.
>>  ///
>>  /// This does not handle most of the necessary updates to `LoopInfo`.
>> Only exit
>> @@ -730,7 +734,7 @@ static BasicBlock *buildClonedLoopBlocks
>>      Loop &L, BasicBlock *LoopPH, BasicBlock *SplitBB,
>>      ArrayRef<BasicBlock *> ExitBlocks, BasicBlock *ParentBB,
>>      BasicBlock *UnswitchedSuccBB, BasicBlock *ContinueSuccBB,
>> -    const SmallPtrSetImpl<BasicBlock *> &SkippedLoopAndExitBlocks,
>> +    const SmallDenseMap<BasicBlock *, BasicBlock *, 16> &DominatingSucc,
>>      ValueToValueMapTy &VMap,
>>      SmallVectorImpl<DominatorTree::UpdateType> &DTUpdates,
>> AssumptionCache &AC,
>>      DominatorTree &DT, LoopInfo &LI) {
>> @@ -751,19 +755,26 @@ static BasicBlock *buildClonedLoopBlocks
>>      return NewBB;
>>    };
>>
>> +  // We skip cloning blocks when they have a dominating succ that is not
>> the
>> +  // succ we are cloning for.
>> +  auto SkipBlock = [&](BasicBlock *BB) {
>> +    auto It = DominatingSucc.find(BB);
>> +    return It != DominatingSucc.end() && It->second != UnswitchedSuccBB;
>> +  };
>> +
>>    // First, clone the preheader.
>>    auto *ClonedPH = CloneBlock(LoopPH);
>>
>>    // Then clone all the loop blocks, skipping the ones that aren't
>> necessary.
>>    for (auto *LoopBB : L.blocks())
>> -    if (!SkippedLoopAndExitBlocks.count(LoopBB))
>> +    if (!SkipBlock(LoopBB))
>>        CloneBlock(LoopBB);
>>
>>    // Split all the loop exit edges so that when we clone the exit
>> blocks, if
>>    // any of the exit blocks are *also* a preheader for some other loop,
>> we
>>    // don't create multiple predecessors entering the loop header.
>>    for (auto *ExitBB : ExitBlocks) {
>> -    if (SkippedLoopAndExitBlocks.count(ExitBB))
>> +    if (SkipBlock(ExitBB))
>>        continue;
>>
>>      // When we are going to clone an exit, we don't need to clone all the
>> @@ -841,7 +852,7 @@ static BasicBlock *buildClonedLoopBlocks
>>    // Update any PHI nodes in the cloned successors of the skipped blocks
>> to not
>>    // have spurious incoming values.
>>    for (auto *LoopBB : L.blocks())
>> -    if (SkippedLoopAndExitBlocks.count(LoopBB))
>> +    if (SkipBlock(LoopBB))
>>        for (auto *SuccBB : successors(LoopBB))
>>          if (auto *ClonedSuccBB =
>> cast_or_null<BasicBlock>(VMap.lookup(SuccBB)))
>>            for (PHINode &PN : ClonedSuccBB->phis())
>> @@ -1175,10 +1186,41 @@ static void buildClonedLoops(Loop &OrigL
>>  }
>>
>>  static void
>> +deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
>> +                       ArrayRef<std::unique_ptr<ValueToValueMapTy>>
>> VMaps,
>> +                       DominatorTree &DT) {
>> +  // Find all the dead clones, and remove them from their successors.
>> +  SmallVector<BasicBlock *, 16> DeadBlocks;
>> +  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(),
>> ExitBlocks))
>> +    for (auto &VMap : VMaps)
>> +      if (BasicBlock *ClonedBB =
>> cast_or_null<BasicBlock>(VMap->lookup(BB)))
>> +        if (!DT.isReachableFromEntry(ClonedBB)) {
>> +          for (BasicBlock *SuccBB : successors(ClonedBB))
>> +            SuccBB->removePredecessor(ClonedBB);
>> +          DeadBlocks.push_back(ClonedBB);
>> +        }
>> +
>> +  // Drop any remaining references to break cycles.
>> +  for (BasicBlock *BB : DeadBlocks)
>> +    BB->dropAllReferences();
>> +  // Erase them from the IR.
>> +  for (BasicBlock *BB : DeadBlocks)
>> +    BB->eraseFromParent();
>> +}
>> +
>> +static void
>>  deleteDeadBlocksFromLoop(Loop &L,
>> -                         const SmallVectorImpl<BasicBlock *> &DeadBlocks,
>>                           SmallVectorImpl<BasicBlock *> &ExitBlocks,
>>                           DominatorTree &DT, LoopInfo &LI) {
>> +  // Find all the dead blocks, and remove them from their successors.
>> +  SmallVector<BasicBlock *, 16> DeadBlocks;
>> +  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(),
>> ExitBlocks))
>> +    if (!DT.isReachableFromEntry(BB)) {
>> +      for (BasicBlock *SuccBB : successors(BB))
>> +        SuccBB->removePredecessor(BB);
>> +      DeadBlocks.push_back(BB);
>> +    }
>> +
>>    SmallPtrSet<BasicBlock *, 16> DeadBlockSet(DeadBlocks.begin(),
>>                                               DeadBlocks.end());
>>
>> @@ -1187,11 +1229,6 @@ deleteDeadBlocksFromLoop(Loop &L,
>>    llvm::erase_if(ExitBlocks,
>>                   [&](BasicBlock *BB) { return DeadBlockSet.count(BB); });
>>
>> -  // Remove these blocks from their successors.
>> -  for (auto *BB : DeadBlocks)
>> -    for (BasicBlock *SuccBB : successors(BB))
>> -      SuccBB->removePredecessor(BB, /*DontDeleteUselessPHIs*/ true);
>> -
>>    // Walk from this loop up through its parents removing all of the dead
>> blocks.
>>    for (Loop *ParentL = &L; ParentL; ParentL = ParentL->getParentLoop()) {
>>      for (auto *BB : DeadBlocks)
>> @@ -1582,31 +1619,24 @@ void visitDomSubTree(DominatorTree &DT,
>>    } while (!DomWorklist.empty());
>>  }
>>
>> -/// Take an invariant branch that has been determined to be safe and
>> worthwhile
>> -/// to unswitch despite being non-trivial to do so and perform the
>> unswitch.
>> -///
>> -/// This directly updates the CFG to hoist the predicate out of the
>> loop, and
>> -/// clone the necessary parts of the loop to maintain behavior.
>> -///
>> -/// It also updates both dominator tree and loopinfo based on the
>> unswitching.
>> -///
>> -/// Once unswitching has been performed it runs the provided callback to
>> report
>> -/// the new loops and no-longer valid loops to the caller.
>> -static bool unswitchInvariantBranch(
>> -    Loop &L, BranchInst &BI, ArrayRef<Value *> Invariants, DominatorTree
>> &DT,
>> -    LoopInfo &LI, AssumptionCache &AC,
>> +static bool unswitchNontrivialInvariants(
>> +    Loop &L, TerminatorInst &TI, ArrayRef<Value *> Invariants,
>> +    DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
>>      function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB) {
>> -  auto *ParentBB = BI.getParent();
>> -
>> -  // We can only unswitch conditional branches with an invariant
>> condition or
>> -  // combining invariant conditions with an instruction.
>> -  assert(BI.isConditional() && "Can only unswitch a conditional
>> branch!");
>> -  bool FullUnswitch = BI.getCondition() == Invariants[0];
>> +  auto *ParentBB = TI.getParent();
>> +  BranchInst *BI = dyn_cast<BranchInst>(&TI);
>> +  SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
>> +
>> +  // We can only unswitch switches, conditional branches with an
>> invariant
>> +  // condition, or combining invariant conditions with an instruction.
>> +  assert((SI || BI->isConditional()) &&
>> +         "Can only unswitch switches and conditional branch!");
>> +  bool FullUnswitch = SI || BI->getCondition() == Invariants[0];
>>    if (FullUnswitch)
>>      assert(Invariants.size() == 1 &&
>>             "Cannot have other invariants with full unswitching!");
>>    else
>> -    assert(isa<Instruction>(BI.getCondition()) &&
>> +    assert(isa<Instruction>(BI->getCondition()) &&
>>             "Partial unswitching requires an instruction as the
>> condition!");
>>
>>    // Constant and BBs tracking the cloned and continuing successor. When
>> we are
>> @@ -1618,18 +1648,27 @@ static bool unswitchInvariantBranch(
>>    bool Direction = true;
>>    int ClonedSucc = 0;
>>    if (!FullUnswitch) {
>> -    if (cast<Instruction>(BI.getCondition())->getOpcode() !=
>> Instruction::Or) {
>> -      assert(cast<Instruction>(BI.getCondition())->getOpcode() ==
>> Instruction::And &&
>> -        "Only `or` and `and` instructions can combine invariants being
>> unswitched.");
>> +    if (cast<Instruction>(BI->getCondition())->getOpcode() !=
>> Instruction::Or) {
>> +      assert(cast<Instruction>(BI->getCondition())->getOpcode() ==
>> +                 Instruction::And &&
>> +             "Only `or` and `and` instructions can combine invariants
>> being "
>> +             "unswitched.");
>>        Direction = false;
>>        ClonedSucc = 1;
>>      }
>>    }
>> -  auto *UnswitchedSuccBB = BI.getSuccessor(ClonedSucc);
>> -  auto *ContinueSuccBB = BI.getSuccessor(1 - ClonedSucc);
>>
>> -  assert(UnswitchedSuccBB != ContinueSuccBB &&
>> -         "Should not unswitch a branch that always goes to the same
>> place!");
>> +  BasicBlock *RetainedSuccBB =
>> +      BI ? BI->getSuccessor(1 - ClonedSucc) : SI->getDefaultDest();
>> +  SmallSetVector<BasicBlock *, 4> UnswitchedSuccBBs;
>> +  if (BI)
>> +    UnswitchedSuccBBs.insert(BI->getSuccessor(ClonedSucc));
>> +  else
>> +    for (auto Case : SI->cases())
>> +      UnswitchedSuccBBs.insert(Case.getCaseSuccessor());
>> +
>> +  assert(!UnswitchedSuccBBs.count(RetainedSuccBB) &&
>> +         "Should not unswitch the same successor we are retaining!");
>>
>>    // The branch should be in this exact loop. Any inner loop's invariant
>> branch
>>    // should be handled by unswitching that inner loop. The caller of this
>> @@ -1648,9 +1687,6 @@ static bool unswitchInvariantBranch(
>>      if (isa<CleanupPadInst>(ExitBB->getFirstNonPHI()))
>>        return false;
>>
>> -  SmallPtrSet<BasicBlock *, 4> ExitBlockSet(ExitBlocks.begin(),
>> -                                            ExitBlocks.end());
>> -
>>    // Compute the parent loop now before we start hacking on things.
>>    Loop *ParentL = L.getParentLoop();
>>
>> @@ -1669,30 +1705,22 @@ static bool unswitchInvariantBranch(
>>        OuterExitL = NewOuterExitL;
>>    }
>>
>> -  // If the edge we *aren't* cloning in the unswitch (the continuing
>> edge)
>> -  // dominates its target, we can skip cloning the dominated region of
>> the loop
>> -  // and its exits. We compute this as a set of nodes to be skipped.
>> -  SmallPtrSet<BasicBlock *, 4> SkippedLoopAndExitBlocks;
>> -  if (ContinueSuccBB->getUniquePredecessor() ||
>> -      llvm::all_of(predecessors(ContinueSuccBB), [&](BasicBlock *PredBB)
>> {
>> -        return PredBB == ParentBB || DT.dominates(ContinueSuccBB,
>> PredBB);
>> -      })) {
>> -    visitDomSubTree(DT, ContinueSuccBB, [&](BasicBlock *BB) {
>> -      SkippedLoopAndExitBlocks.insert(BB);
>> -      return true;
>> -    });
>> -  }
>> -  // If we are doing full unswitching, then similarly to the above, the
>> edge we
>> -  // *are* cloning in the unswitch (the unswitched edge) dominates its
>> target,
>> -  // we will end up with dead nodes in the original loop and its exits
>> that will
>> -  // need to be deleted. Here, we just retain that the property holds
>> and will
>> -  // compute the deleted set later.
>> -  bool DeleteUnswitchedSucc =
>> -      FullUnswitch &&
>> -      (UnswitchedSuccBB->getUniquePredecessor() ||
>> -       llvm::all_of(predecessors(UnswitchedSuccBB), [&](BasicBlock
>> *PredBB) {
>> -         return PredBB == ParentBB || DT.dominates(UnswitchedSuccBB,
>> PredBB);
>> -       }));
>> +  // If the edge from this terminator to a successor dominates that
>> successor,
>> +  // store a map from each block in its dominator subtree to it. This
>> lets us
>> +  // tell when cloning for a particular successor if a block is
>> dominated by
>> +  // some *other* successor with a single data structure. We use this to
>> +  // significantly reduce cloning.
>> +  SmallDenseMap<BasicBlock *, BasicBlock *, 16> DominatingSucc;
>> +  for (auto *SuccBB : llvm::concat<BasicBlock *const>(
>> +           makeArrayRef(RetainedSuccBB), UnswitchedSuccBBs))
>> +    if (SuccBB->getUniquePredecessor() ||
>> +        llvm::all_of(predecessors(SuccBB), [&](BasicBlock *PredBB) {
>> +          return PredBB == ParentBB || DT.dominates(SuccBB, PredBB);
>> +        }))
>> +      visitDomSubTree(DT, SuccBB, [&](BasicBlock *BB) {
>> +        DominatingSucc[BB] = SuccBB;
>> +        return true;
>> +      });
>>
>>    // Split the preheader, so that we know that there is a safe place to
>> insert
>>    // the conditional branch. We will change the preheader to have a
>> conditional
>> @@ -1702,84 +1730,93 @@ static bool unswitchInvariantBranch(
>>    BasicBlock *SplitBB = L.getLoopPreheader();
>>    BasicBlock *LoopPH = SplitEdge(SplitBB, L.getHeader(), &DT, &LI);
>>
>> -  // Keep a mapping for the cloned values.
>> -  ValueToValueMapTy VMap;
>> -
>>    // Keep track of the dominator tree updates needed.
>>    SmallVector<DominatorTree::UpdateType, 4> DTUpdates;
>>
>> -  // Build the cloned blocks from the loop.
>> -  auto *ClonedPH = buildClonedLoopBlocks(
>> -      L, LoopPH, SplitBB, ExitBlocks, ParentBB, UnswitchedSuccBB,
>> -      ContinueSuccBB, SkippedLoopAndExitBlocks, VMap, DTUpdates, AC, DT,
>> LI);
>> +  // Clone the loop for each unswitched successor.
>> +  SmallVector<std::unique_ptr<ValueToValueMapTy>, 4> VMaps;
>> +  VMaps.reserve(UnswitchedSuccBBs.size());
>> +  SmallDenseMap<BasicBlock *, BasicBlock *, 4> ClonedPHs;
>> +  for (auto *SuccBB : UnswitchedSuccBBs) {
>> +    VMaps.emplace_back(new ValueToValueMapTy());
>> +    ClonedPHs[SuccBB] = buildClonedLoopBlocks(
>> +        L, LoopPH, SplitBB, ExitBlocks, ParentBB, SuccBB, RetainedSuccBB,
>> +        DominatingSucc, *VMaps.back(), DTUpdates, AC, DT, LI);
>> +  }
>>
>>    // The stitching of the branched code back together depends on whether
>> we're
>>    // doing full unswitching or not with the exception that we always
>> want to
>>    // nuke the initial terminator placed in the split block.
>>    SplitBB->getTerminator()->eraseFromParent();
>>    if (FullUnswitch) {
>> -    // Remove the parent as a predecessor of the
>> -    // unswitched successor.
>> -    UnswitchedSuccBB->removePredecessor(ParentBB,
>> -                                        /*DontDeleteUselessPHIs*/ true);
>> -    DTUpdates.push_back({DominatorTree::Delete, ParentBB,
>> UnswitchedSuccBB});
>> -
>> -    // Now splice the branch from the original loop and use it to select
>> between
>> -    // the two loops.
>> -    SplitBB->getInstList().splice(SplitBB->end(),
>> ParentBB->getInstList(), BI);
>> -    BI.setSuccessor(ClonedSucc, ClonedPH);
>> -    BI.setSuccessor(1 - ClonedSucc, LoopPH);
>> +    for (BasicBlock *SuccBB : UnswitchedSuccBBs) {
>> +      // Remove the parent as a predecessor of the unswitched successor.
>> +      SuccBB->removePredecessor(ParentBB,
>> +                                /*DontDeleteUselessPHIs*/ true);
>> +      DTUpdates.push_back({DominatorTree::Delete, ParentBB, SuccBB});
>> +    }
>> +
>> +    // Now splice the terminator from the original loop and rewrite its
>> +    // successors.
>> +    SplitBB->getInstList().splice(SplitBB->end(),
>> ParentBB->getInstList(), TI);
>> +    if (BI) {
>> +      assert(UnswitchedSuccBBs.size() == 1 &&
>> +             "Only one possible unswitched block for a branch!");
>> +      BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>> +      BI->setSuccessor(ClonedSucc, ClonedPH);
>> +      BI->setSuccessor(1 - ClonedSucc, LoopPH);
>> +      DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>> +    } else {
>> +      assert(SI && "Must either be a branch or switch!");
>> +
>> +      // Walk the cases and directly update their successors.
>> +      for (auto &Case : SI->cases())
>> +
>> Case.setSuccessor(ClonedPHs.find(Case.getCaseSuccessor())->second);
>> +      // We need to use the set to populate domtree updates as even when
>> there
>> +      // are multiple cases pointing at the same successor we only want
>> to
>> +      // insert one edge in the domtree.
>> +      for (BasicBlock *SuccBB : UnswitchedSuccBBs)
>> +        DTUpdates.push_back(
>> +            {DominatorTree::Insert, SplitBB,
>> ClonedPHs.find(SuccBB)->second});
>> +
>> +      SI->setDefaultDest(LoopPH);
>> +    }
>>
>>      // Create a new unconditional branch to the continuing block (as
>> opposed to
>>      // the one cloned).
>> -    BranchInst::Create(ContinueSuccBB, ParentBB);
>> +    BranchInst::Create(RetainedSuccBB, ParentBB);
>>    } else {
>> +    assert(BI && "Only branches have partial unswitching.");
>> +    assert(UnswitchedSuccBBs.size() == 1 &&
>> +           "Only one possible unswitched block for a branch!");
>> +    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>>      // When doing a partial unswitch, we have to do a bit more work to
>> build up
>>      // the branch in the split block.
>>      buildPartialUnswitchConditionalBranch(*SplitBB, Invariants,
>> Direction,
>>                                            *ClonedPH, *LoopPH);
>> +    DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>>    }
>>
>> -  // Before we update the dominator tree, collect the dead blocks if
>> we're going
>> -  // to end up deleting the unswitched successor.
>> -  SmallVector<BasicBlock *, 16> DeadBlocks;
>> -  if (DeleteUnswitchedSucc) {
>> -    DeadBlocks.push_back(UnswitchedSuccBB);
>> -    for (int i = 0; i < (int)DeadBlocks.size(); ++i) {
>> -      // If we reach an exit block, stop recursing as the unswitched
>> loop will
>> -      // end up reaching the merge block which we make the successor of
>> the
>> -      // exit.
>> -      if (ExitBlockSet.count(DeadBlocks[i]))
>> -        continue;
>> -
>> -      // Insert the children that are within the loop or exit block set.
>> Other
>> -      // children may reach out of the loop. While we don't expect these
>> to be
>> -      // dead (as the unswitched clone should reach them) we don't try
>> to prove
>> -      // that here.
>> -      for (DomTreeNode *ChildN : *DT[DeadBlocks[i]])
>> -        if (L.contains(ChildN->getBlock()) ||
>> -            ExitBlockSet.count(ChildN->getBlock()))
>> -          DeadBlocks.push_back(ChildN->getBlock());
>> -    }
>> -  }
>> -
>> -  // Add the remaining edge to our updates and apply them to get an
>> up-to-date
>> -  // dominator tree. Note that this will cause the dead blocks above to
>> be
>> -  // unreachable and no longer in the dominator tree.
>> -  DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>> +  // Apply the updates accumulated above to get an up-to-date dominator
>> tree.
>>    DT.applyUpdates(DTUpdates);
>>
>> +  // Now that we have an accurate dominator tree, first delete the dead
>> cloned
>> +  // blocks so that we can accurately build any cloned loops. It is
>> important to
>> +  // not delete the blocks from the original loop yet because we still
>> want to
>> +  // reference the original loop to understand the cloned loop's
>> structure.
>> +  deleteDeadClonedBlocks(L, ExitBlocks, VMaps, DT);
>> +
>>    // Build the cloned loop structure itself. This may be substantially
>>    // different from the original structure due to the simplified CFG.
>> This also
>>    // handles inserting all the cloned blocks into the correct loops.
>>    SmallVector<Loop *, 4> NonChildClonedLoops;
>> -  buildClonedLoops(L, ExitBlocks, VMap, LI, NonChildClonedLoops);
>> -
>> -  // Delete anything that was made dead in the original loop due to
>> -  // unswitching.
>> -  if (!DeadBlocks.empty())
>> -    deleteDeadBlocksFromLoop(L, DeadBlocks, ExitBlocks, DT, LI);
>> +  for (std::unique_ptr<ValueToValueMapTy> &VMap : VMaps)
>> +    buildClonedLoops(L, ExitBlocks, *VMap, LI, NonChildClonedLoops);
>>
>> +  // Now that our cloned loops have been built, we can update the
>> original loop.
>> +  // First we delete the dead blocks from it and then we rebuild the loop
>> +  // structure taking these deletions into account.
>> +  deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI);
>>    SmallVector<Loop *, 4> HoistedLoops;
>>    bool IsStillLoop = rebuildLoopAfterUnswitch(L, ExitBlocks, LI,
>> HoistedLoops);
>>
>> @@ -1790,31 +1827,37 @@ static bool unswitchInvariantBranch(
>>    // verification steps.
>>    assert(DT.verify(DominatorTree::VerificationLevel::Fast));
>>
>> -  // Now we want to replace all the uses of the invariants within both
>> the
>> -  // original and cloned blocks. We do this here so that we can use the
>> now
>> -  // updated dominator tree to identify which side the users are on.
>> -  ConstantInt *UnswitchedReplacement =
>> -      Direction ? ConstantInt::getTrue(BI.getContext())
>> -                : ConstantInt::getFalse(BI.getContext());
>> -  ConstantInt *ContinueReplacement =
>> -      Direction ? ConstantInt::getFalse(BI.getContext())
>> -                : ConstantInt::getTrue(BI.getContext());
>> -  for (Value *Invariant : Invariants)
>> -    for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
>> -         UI != UE;) {
>> -      // Grab the use and walk past it so we can clobber it in the use
>> list.
>> -      Use *U = &*UI++;
>> -      Instruction *UserI = dyn_cast<Instruction>(U->getUser());
>> -      if (!UserI)
>> -        continue;
>> +  if (BI) {
>> +    // If we unswitched a branch which collapses the condition to a known
>> +    // constant we want to replace all the uses of the invariants within
>> both
>> +    // the original and cloned blocks. We do this here so that we can
>> use the
>> +    // now updated dominator tree to identify which side the users are
>> on.
>> +    assert(UnswitchedSuccBBs.size() == 1 &&
>> +           "Only one possible unswitched block for a branch!");
>> +    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>> +    ConstantInt *UnswitchedReplacement =
>> +        Direction ? ConstantInt::getTrue(BI->getContext())
>> +                  : ConstantInt::getFalse(BI->getContext());
>> +    ConstantInt *ContinueReplacement =
>> +        Direction ? ConstantInt::getFalse(BI->getContext())
>> +                  : ConstantInt::getTrue(BI->getContext());
>> +    for (Value *Invariant : Invariants)
>> +      for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
>> +           UI != UE;) {
>> +        // Grab the use and walk past it so we can clobber it in the use
>> list.
>> +        Use *U = &*UI++;
>> +        Instruction *UserI = dyn_cast<Instruction>(U->getUser());
>> +        if (!UserI)
>> +          continue;
>>
>> -      // Replace it with the 'continue' side if in the main loop body,
>> and the
>> -      // unswitched if in the cloned blocks.
>> -      if (DT.dominates(LoopPH, UserI->getParent()))
>> -        U->set(ContinueReplacement);
>> -      else if (DT.dominates(ClonedPH, UserI->getParent()))
>> -        U->set(UnswitchedReplacement);
>> -    }
>> +        // Replace it with the 'continue' side if in the main loop body,
>> and the
>> +        // unswitched if in the cloned blocks.
>> +        if (DT.dominates(LoopPH, UserI->getParent()))
>> +          U->set(ContinueReplacement);
>> +        else if (DT.dominates(ClonedPH, UserI->getParent()))
>> +          U->set(UnswitchedReplacement);
>> +      }
>> +  }
>>
>>    // We can change which blocks are exit blocks of all the cloned sibling
>>    // loops, the current loop, and any parent loops which shared exit
>> blocks
>> @@ -1937,8 +1980,16 @@ static bool unswitchBestCondition(
>>      if (LI.getLoopFor(BB) != &L)
>>        continue;
>>
>> +    if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
>> +      // We can only consider fully loop-invariant switch conditions as
>> we need
>> +      // to completely eliminate the switch after unswitching.
>> +      if (!isa<Constant>(SI->getCondition()) &&
>> +          L.isLoopInvariant(SI->getCondition()))
>> +        UnswitchCandidates.push_back({SI, {SI->getCondition()}});
>> +      continue;
>> +    }
>> +
>>      auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
>> -    // FIXME: Handle switches here!
>>      if (!BI || !BI->isConditional() || isa<Constant>(BI->getCondition())
>> ||
>>          BI->getSuccessor(0) == BI->getSuccessor(1))
>>        continue;
>> @@ -2091,9 +2142,9 @@ static bool unswitchBestCondition(
>>      TerminatorInst &TI = *TerminatorAndInvariants.first;
>>      ArrayRef<Value *> Invariants = TerminatorAndInvariants.second;
>>      BranchInst *BI = dyn_cast<BranchInst>(&TI);
>> -    int CandidateCost =
>> -        ComputeUnswitchedCost(TI, /*FullUnswitch*/ Invariants.size() ==
>> 1 && BI &&
>> -                                      Invariants[0] ==
>> BI->getCondition());
>> +    int CandidateCost = ComputeUnswitchedCost(
>> +        TI, /*FullUnswitch*/ !BI || (Invariants.size() == 1 &&
>> +                                     Invariants[0] ==
>> BI->getCondition()));
>>      LLVM_DEBUG(dbgs() << "  Computed cost of " << CandidateCost
>>                        << " for unswitch candidate: " << TI << "\n");
>>      if (!BestUnswitchTI || CandidateCost < BestUnswitchCost) {
>> @@ -2109,17 +2160,11 @@ static bool unswitchBestCondition(
>>      return false;
>>    }
>>
>> -  auto *UnswitchBI = dyn_cast<BranchInst>(BestUnswitchTI);
>> -  if (!UnswitchBI) {
>> -    // FIXME: Add support for unswitching a switch here!
>> -    LLVM_DEBUG(dbgs() << "Cannot unswitch anything but a branch!\n");
>> -    return false;
>> -  }
>> -
>>    LLVM_DEBUG(dbgs() << "  Trying to unswitch non-trivial (cost = "
>> -                    << BestUnswitchCost << ") branch: " << *UnswitchBI
>> << "\n");
>> -  return unswitchInvariantBranch(L, *UnswitchBI, BestUnswitchInvariants,
>> DT, LI,
>> -                                 AC, UnswitchCB);
>> +                    << BestUnswitchCost << ") terminator: " <<
>> *BestUnswitchTI
>> +                    << "\n");
>> +  return unswitchNontrivialInvariants(
>> +      L, *BestUnswitchTI, BestUnswitchInvariants, DT, LI, AC,
>> UnswitchCB);
>>  }
>>
>>  /// Unswitch control flow predicated on loop invariant conditions.
>>
>> Modified:
>> llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll?rev=335553&r1=335552&r2=335553&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>> (original)
>> +++ llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>> Mon Jun 25 16:32:54 2018
>> @@ -387,7 +387,7 @@ loop_begin:
>>  loop_b:
>>    %b = load i32, i32* %b.ptr
>>    br i1 %v, label %loop_begin, label %loop_exit
>> -; The 'loop_b' unswitched loop.
>> +; The original loop, now non-looping due to unswitching..
>>  ;
>>  ; CHECK:       entry.split:
>>  ; CHECK-NEXT:    br label %loop_begin
>> @@ -398,14 +398,13 @@ loop_b:
>>  ; CHECK-NEXT:    br label %loop_exit.split
>>  ;
>>  ; CHECK:       loop_exit.split:
>> -; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A]], %loop_begin ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>
>>  loop_exit:
>>    %ab.phi = phi i32 [ %b, %loop_b ], [ %a, %loop_begin ]
>>    ret i32 %ab.phi
>>  ; CHECK:       loop_exit:
>> -; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>> %loop_exit.split ], [ %[[B_LCSSA]], %loop_exit.split.us ]
>> +; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A]], %loop_exit.split ],
>> [ %[[B_LCSSA]], %loop_exit.split.us ]
>>  ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>  }
>>
>> @@ -458,8 +457,7 @@ loop_exit1:
>>    call void @sink1(i32 %a.phi)
>>    ret void
>>  ; CHECK:       loop_exit1:
>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>> loop_exit1.split.us ]
>> -; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
>> +; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
>>  ; CHECK-NEXT:    ret void
>>
>>  loop_exit2:
>> @@ -467,8 +465,8 @@ loop_exit2:
>>    call void @sink2(i32 %b.phi)
>>    ret void
>>  ; CHECK:       loop_exit2:
>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>  ; CHECK-NEXT:    ret void
>>  }
>>
>> @@ -531,8 +529,7 @@ loop_exit2:
>>    call void @sink2(i32 %b.phi)
>>    ret void
>>  ; CHECK:       loop_exit2:
>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %
>> loop_exit2.split.us ]
>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>  ; CHECK-NEXT:    ret void
>>  }
>>
>> @@ -587,8 +584,7 @@ loop_exit1:
>>    call void @sink1(i32 %a.phi)
>>    br label %exit
>>  ; CHECK:       loop_exit1:
>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>> loop_exit1.split.us ]
>> -; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
>> +; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
>>  ; CHECK-NEXT:    br label %exit
>>
>>  loop_exit2:
>> @@ -596,8 +592,8 @@ loop_exit2:
>>    call void @sink2(i32 %b.phi)
>>    br label %exit
>>  ; CHECK:       loop_exit2:
>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>  ; CHECK-NEXT:    br label %exit
>>
>>  exit:
>> @@ -663,7 +659,7 @@ loop_latch:
>>    %v2 = load i1, i1* %ptr
>>    br i1 %v2, label %loop_begin, label %loop_exit
>>  ; CHECK:       loop_latch:
>> -; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %inner_loop_b ]
>> +; CHECK-NEXT:    %[[B_INNER_LCSSA:.*]] = phi i32 [ %[[B]], %inner_loop_b
>> ]
>>  ; CHECK-NEXT:    %[[V2:.*]] = load i1, i1* %ptr
>>  ; CHECK-NEXT:    br i1 %[[V2]], label %loop_begin, label
>> %loop_exit.loopexit1
>>
>> @@ -671,15 +667,14 @@ loop_exit:
>>    %ab.phi = phi i32 [ %a, %inner_loop_begin ], [ %b.phi, %loop_latch ]
>>    ret i32 %ab.phi
>>  ; CHECK:       loop_exit.loopexit:
>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>> loop_exit.loopexit.split.us ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit.loopexit1:
>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %loop_latch ]
>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]],
>> %loop_latch ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit:
>> -; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_PHI]],
>> %loop_exit.loopexit ], [ %[[B_PHI]], %loop_exit.loopexit1 ]
>> +; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>> %loop_exit.loopexit ], [ %[[B_LCSSA]], %loop_exit.loopexit1 ]
>>  ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>  }
>>
>> @@ -773,11 +768,10 @@ latch:
>>  ; CHECK-NEXT:    br label %latch
>>  ;
>>  ; CHECK:       latch:
>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_INNER_LCSSA]],
>> %loop_b_inner_exit ]
>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit.split
>>  ;
>>  ; CHECK:       loop_exit.split:
>> -; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_PHI]], %latch ]
>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]], %latch ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>
>>  loop_exit:
>> @@ -1466,7 +1460,6 @@ inner_loop_exit:
>>    %v = load i1, i1* %ptr
>>    br i1 %v, label %loop_begin, label %loop_exit
>>  ; CHECK:       inner_loop_exit:
>> -; CHECK-NEXT:    %[[A_INNER_LCSSA:.*]] = phi i32 [
>> %[[A_INNER_LCSSA_US]], %inner_loop_exit.split.us ]
>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
>>
>> @@ -1474,7 +1467,7 @@ loop_exit:
>>    %a.lcssa = phi i32 [ %a.inner_lcssa, %inner_loop_exit ]
>>    ret i32 %a.lcssa
>>  ; CHECK:       loop_exit:
>> -; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA]],
>> %inner_loop_exit ]
>> +; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>> %inner_loop_exit ]
>>  ; CHECK-NEXT:    ret i32 %[[A_LCSSA]]
>>  }
>>
>> @@ -1555,7 +1548,7 @@ loop_exit:
>>    ret i32 %a.lcssa
>>  ; CHECK:       loop_exit:
>>  ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>> %loop_exit.split ], [ %[[A_PHI_US]], %loop_exit.split.us ]
>> -; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>> +; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>  }
>>
>>  ; Test that requires re-forming dedicated exits for the original loop.
>> @@ -1635,7 +1628,7 @@ loop_exit:
>>    ret i32 %a.lcssa
>>  ; CHECK:       loop_exit:
>>  ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_PHI_SPLIT]],
>> %loop_exit.split ], [ %[[A_LCSSA_US]], %loop_exit.split.us ]
>> -; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>> +; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>  }
>>
>>  ; Check that if a cloned inner loop after unswitching doesn't loop and
>> directly
>> @@ -1721,7 +1714,6 @@ loop_exit:
>>    %a.lcssa = phi i32 [ %a, %inner_loop_begin ], [ %a.inner_lcssa,
>> %inner_loop_exit ]
>>    ret i32 %a.lcssa
>>  ; CHECK:       loop_exit.loopexit:
>> -; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %
>> loop_exit.loopexit.split.us ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit.loopexit1:
>> @@ -1729,7 +1721,7 @@ loop_exit:
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit:
>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA_US]],
>> %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
>> +; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>> %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
>>  ; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>  }
>>
>> @@ -1802,7 +1794,6 @@ inner_loop_exit:
>>    %v3 = load i1, i1* %ptr
>>    br i1 %v3, label %loop_latch, label %loop_exit
>>  ; CHECK:       inner_loop_exit:
>> -; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %
>> inner_loop_exit.split.us ]
>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_latch, label
>> %loop_exit.loopexit1
>>
>> @@ -1819,7 +1810,7 @@ loop_exit:
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit.loopexit1:
>> -; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_PHI]],
>> %inner_loop_exit ]
>> +; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>> %inner_loop_exit ]
>>  ; CHECK-NEXT:    br label %loop_exit
>>  ;
>>  ; CHECK:       loop_exit:
>> @@ -1916,7 +1907,6 @@ inner_loop_exit:
>>    %v4 = load i1, i1* %ptr
>>    br i1 %v4, label %loop_begin, label %loop_exit
>>  ; CHECK:       inner_loop_exit.loopexit:
>> -; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>> %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit.split.us ]
>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>  ;
>>  ; CHECK:       inner_loop_exit.loopexit1:
>> @@ -1924,7 +1914,7 @@ inner_loop_exit:
>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>  ;
>>  ; CHECK:       inner_loop_exit:
>> -; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>> %inner_loop_exit.loopexit ], [ %[[A_INNER_LCSSA]],
>> %inner_loop_exit.loopexit1 ]
>> +; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [
>> %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit ], [
>> %[[A_INNER_LCSSA]], %inner_loop_exit.loopexit1 ]
>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
>>
>> @@ -2010,7 +2000,6 @@ inner_inner_loop_exit:
>>    %v3 = load i1, i1* %ptr
>>    br i1 %v3, label %inner_loop_latch, label %inner_loop_exit
>>  ; CHECK:       inner_inner_loop_exit:
>> -; CHECK-NEXT:    %[[A_INNER_INNER_PHI:.*]] = phi i32 [
>> %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit.split.us ]
>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>  ; CHECK-NEXT:    br i1 %[[V]], label %inner_loop_latch, label
>> %inner_loop_exit.loopexit1
>>
>> @@ -2028,7 +2017,7 @@ inner_loop_exit:
>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>  ;
>>  ; CHECK:       inner_loop_exit.loopexit1:
>> -; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>> %[[A_INNER_INNER_PHI]], %inner_inner_loop_exit ]
>> +; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>> %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit ]
>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>  ;
>>  ; CHECK:       inner_loop_exit:
>> @@ -2296,56 +2285,96 @@ define i32 @test20(i32* %var, i32 %cond1
>>  entry:
>>    br label %loop_begin
>>  ; CHECK-NEXT:  entry:
>> -; CHECK-NEXT:    br label %loop_begin
>> +; CHECK-NEXT:    switch i32 %cond2, label %[[ENTRY_SPLIT_EXIT:.*]] [
>> +; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
>> +; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_A]]
>> +; CHECK-NEXT:      i32 13, label %[[ENTRY_SPLIT_B:.*]]
>> +; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_A]]
>> +; CHECK-NEXT:      i32 42, label %[[ENTRY_SPLIT_C:.*]]
>> +; CHECK-NEXT:    ]
>>
>>  loop_begin:
>>    %var_val = load i32, i32* %var
>> -  switch i32 %cond2, label %loop_a [
>> -    i32 0, label %loop_b
>> -    i32 1, label %loop_b
>> -    i32 13, label %loop_c
>> -    i32 2, label %loop_b
>> -    i32 42, label %loop_exit
>> +  switch i32 %cond2, label %loop_exit [
>> +    i32 0, label %loop_a
>> +    i32 1, label %loop_a
>> +    i32 13, label %loop_b
>> +    i32 2, label %loop_a
>> +    i32 42, label %loop_c
>>    ]
>> -; CHECK:       loop_begin:
>> -; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
>> -; CHECK-NEXT:    switch i32 %cond2, label %loop_a [
>> -; CHECK-NEXT:      i32 0, label %loop_b
>> -; CHECK-NEXT:      i32 1, label %loop_b
>> -; CHECK-NEXT:      i32 13, label %loop_c
>> -; CHECK-NEXT:      i32 2, label %loop_b
>> -; CHECK-NEXT:      i32 42, label %loop_exit
>> -; CHECK-NEXT:    ]
>>
>>  loop_a:
>>    call void @a()
>>    br label %loop_latch
>> -; CHECK:       loop_a:
>> +; Unswitched 'a' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_A]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_A]]:
>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>> +; CHECK-NEXT:    br label %[[LOOP_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_A]]:
>>  ; CHECK-NEXT:    call void @a()
>> -; CHECK-NEXT:    br label %loop_latch
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_A]]:
>> +; CHECK:         br label %[[LOOP_BEGIN_A]]
>>
>>  loop_b:
>>    call void @b()
>>    br label %loop_latch
>> -; CHECK:       loop_b:
>> +; Unswitched 'b' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_B]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_B]]:
>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>> +; CHECK-NEXT:    br label %[[LOOP_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_B]]:
>>  ; CHECK-NEXT:    call void @b()
>> -; CHECK-NEXT:    br label %loop_latch
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_B]]:
>> +; CHECK:         br label %[[LOOP_BEGIN_B]]
>>
>>  loop_c:
>>    call void @c() noreturn nounwind
>>    br label %loop_latch
>> -; CHECK:       loop_c:
>> +; Unswitched 'c' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_C]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_C]]:
>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>> +; CHECK-NEXT:    br label %[[LOOP_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_C]]:
>>  ; CHECK-NEXT:    call void @c()
>> -; CHECK-NEXT:    br label %loop_latch
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_C]]:
>> +; CHECK:         br label %[[LOOP_BEGIN_C]]
>>
>>  loop_latch:
>>    br label %loop_begin
>> -; CHECK:       loop_latch:
>> -; CHECK-NEXT:    br label %loop_begin
>>
>>  loop_exit:
>>    %lcssa = phi i32 [ %var_val, %loop_begin ]
>>    ret i32 %lcssa
>> +; Unswitched exit edge (no longer a loop).
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_EXIT]]:
>> +; CHECK-NEXT:    br label %loop_begin
>> +;
>> +; CHECK:       loop_begin:
>> +; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
>> +; CHECK-NEXT:    br label %loop_exit
>> +;
>>  ; CHECK:       loop_exit:
>>  ; CHECK-NEXT:    %[[LCSSA:.*]] = phi i32 [ %[[V]], %loop_begin ]
>>  ; CHECK-NEXT:    ret i32 %[[LCSSA]]
>> @@ -2824,3 +2853,112 @@ loop_exit:
>>  ; CHECK:       loop_exit:
>>  ; CHECK-NEXT:    ret
>>  }
>> +
>> +; Non-trivial unswitching of a switch.
>> +define i32 @test27(i1* %ptr, i32 %cond) {
>> +; CHECK-LABEL: @test27(
>> +entry:
>> +  br label %loop_begin
>> +; CHECK-NEXT:  entry:
>> +; CHECK-NEXT:    switch i32 %cond, label %[[ENTRY_SPLIT_LATCH:.*]] [
>> +; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
>> +; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_B:.*]]
>> +; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_C:.*]]
>> +; CHECK-NEXT:    ]
>> +
>> +loop_begin:
>> +  switch i32 %cond, label %latch [
>> +    i32 0, label %loop_a
>> +    i32 1, label %loop_b
>> +    i32 2, label %loop_c
>> +  ]
>> +
>> +loop_a:
>> +  call void @a()
>> +  br label %latch
>> +; Unswitched 'a' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_A]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_A]]:
>> +; CHECK-NEXT:    br label %[[LOOP_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_A]]:
>> +; CHECK-NEXT:    call void @a()
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_A]]:
>> +; CHECK-NEXT:    %[[V_A:.*]] = load i1, i1* %ptr
>> +; CHECK:         br i1 %[[V_A]], label %[[LOOP_BEGIN_A]], label
>> %[[LOOP_EXIT_A:.*]]
>> +;
>> +; CHECK:       [[LOOP_EXIT_A]]:
>> +; CHECK-NEXT:    br label %loop_exit
>> +
>> +loop_b:
>> +  call void @b()
>> +  br label %latch
>> +; Unswitched 'b' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_B]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_B]]:
>> +; CHECK-NEXT:    br label %[[LOOP_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_B]]:
>> +; CHECK-NEXT:    call void @b()
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_B]]:
>> +; CHECK-NEXT:    %[[V_B:.*]] = load i1, i1* %ptr
>> +; CHECK:         br i1 %[[V_B]], label %[[LOOP_BEGIN_B]], label
>> %[[LOOP_EXIT_B:.*]]
>> +;
>> +; CHECK:       [[LOOP_EXIT_B]]:
>> +; CHECK-NEXT:    br label %loop_exit
>> +
>> +loop_c:
>> +  call void @c()
>> +  br label %latch
>> +; Unswitched 'c' loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_C]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_C]]:
>> +; CHECK-NEXT:    br label %[[LOOP_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_C]]:
>> +; CHECK-NEXT:    call void @c()
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_C]]:
>> +; CHECK-NEXT:    %[[V_C:.*]] = load i1, i1* %ptr
>> +; CHECK:         br i1 %[[V_C]], label %[[LOOP_BEGIN_C]], label
>> %[[LOOP_EXIT_C:.*]]
>> +;
>> +; CHECK:       [[LOOP_EXIT_C]]:
>> +; CHECK-NEXT:    br label %loop_exit
>> +
>> +latch:
>> +  %v = load i1, i1* %ptr
>> +  br i1 %v, label %loop_begin, label %loop_exit
>> +; Unswitched the 'latch' only loop.
>> +;
>> +; CHECK:       [[ENTRY_SPLIT_LATCH]]:
>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_LATCH:.*]]
>> +;
>> +; CHECK:       [[LOOP_BEGIN_LATCH]]:
>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_LATCH:.*]]
>> +;
>> +; CHECK:       [[LOOP_LATCH_LATCH]]:
>> +; CHECK-NEXT:    %[[V_LATCH:.*]] = load i1, i1* %ptr
>> +; CHECK:         br i1 %[[V_LATCH]], label %[[LOOP_BEGIN_LATCH]], label
>> %[[LOOP_EXIT_LATCH:.*]]
>> +;
>> +; CHECK:       [[LOOP_EXIT_LATCH]]:
>> +; CHECK-NEXT:    br label %loop_exit
>> +
>> +loop_exit:
>> +  ret i32 0
>> +; CHECK:       loop_exit:
>> +; CHECK-NEXT:    ret i32 0
>> +}
>> \ No newline at end of file
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180709/4ef679eb/attachment-0001.html>