[llvm] r335553 - [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial

Chandler Carruth via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 9 03:36:44 PDT 2018


I've pushed r336536 that should fix most if not all of the issues here, and
I'll continue to do testing to try and see if there is more laying in wait.

On Mon, Jul 9, 2018 at 2:51 AM Chandler Carruth <chandlerc at gmail.com> wrote:

> Also +Alina Sbirlea <asbirlea at google.com> as she's been hacking on top of
> this...
>
> On Mon, Jul 9, 2018 at 2:51 AM Chandler Carruth <chandlerc at google.com>
> wrote:
>
>> FYI for folks testing out this functionality, there are collection of
>> serious bugs in this commit. I've got a fix and just need to add some
>> testing. Will have it landed tomorrow. Just wanted to send a heads up.
>>
>> +Mikael Holmén <mikael.holmen at ericsson.com> +Fedor Sergeev
>> <fedor.sergeev at azul.com>
>>
>> On Mon, Jun 25, 2018 at 4:37 PM Chandler Carruth via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: chandlerc
>>> Date: Mon Jun 25 16:32:54 2018
>>> New Revision: 335553
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=335553&view=rev
>>> Log:
>>> [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
>>> unswitching of switches.
>>>
>>> This works much like trivial unswitching of switches in that it reliably
>>> moves the switch out of the loop. Here we potentially clone the entire
>>> loop into each successor of the switch and re-point the cases at these
>>> clones.
>>>
>>> Due to the complexity of actually doing nontrivial unswitching, this
>>> patch doesn't create a dedicated routine for handling switches -- it
>>> would duplicate far too much code. Instead, it generalizes the existing
>>> routine to handle both branches and switches as it largely reduces to
>>> looping in a few places instead of doing something once. This actually
>>> improves the results in some cases with branches due to being much more
>>> careful about how dead regions of code are managed. With branches,
>>> because exactly one clone is created and there are exactly two edges
>>> considered, somewhat sloppy handling of the dead regions of code was
>>> sufficient in most cases. But with switches, there are much more
>>> complicated patterns of dead code and so I've had to move to a more
>>> robust model generally. We still do as much pruning of the dead code
>>> early as possible because that allows us to avoid even cloning the code.
>>>
>>> This also surfaced another problem with nontrivial unswitching before
>>> which is that we weren't as precise in reconstructing loops as we could
>>> have been. This seems to have been mostly harmless, but resulted in
>>> pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
>>> have to get this *right*, and everything benefits from it.
>>>
>>> While the testing may seem a bit light here because we only have two
>>> real cases with actual switches, they do a surprisingly good job of
>>> exercising numerous edge cases. Also, because we share the logic with
>>> branches, most of the changes in this patch are reasonably well covered
>>> by existing tests.
>>>
>>> The new unswitch now has all of the same fundamental power as the old
>>> one with the exception of the single unsound case of *partial* switch
>>> unswitching -- that really is just loop specialization and not
>>> unswitching at all. It doesn't fit into the canonicalization model in
>>> any way. We can add a loop specialization pass that runs late based on
>>> profile data if important test cases ever come up here.
>>>
>>> Differential Revision: https://reviews.llvm.org/D47683
>>>
>>> Modified:
>>>     llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>>>     llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
>>>     llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>>>
>>> Modified: llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h?rev=335553&r1=335552&r2=335553&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h
>>> (original)
>>> +++ llvm/trunk/include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h Mon
>>> Jun 25 16:32:54 2018
>>> @@ -17,9 +17,9 @@
>>>
>>>  namespace llvm {
>>>
>>> -/// This pass transforms loops that contain branches on loop-invariant
>>> -/// conditions to have multiple loops. For example, it turns the left
>>> into the
>>> -/// right code:
>>> +/// This pass transforms loops that contain branches or switches on
>>> loop-
>>> +/// invariant conditions to have multiple loops. For example, it turns
>>> the left
>>> +/// into the right code:
>>>  ///
>>>  ///  for (...)                  if (lic)
>>>  ///    A                          for (...)
>>> @@ -35,6 +35,31 @@ namespace llvm {
>>>  /// This pass expects LICM to be run before it to hoist invariant
>>> conditions out
>>>  /// of the loop, to make the unswitching opportunity obvious.
>>>  ///
>>> +/// There is a taxonomy of unswitching that we use to classify
>>> different forms
>>> +/// of this transformaiton:
>>> +///
>>> +/// - Trival unswitching: this is when the condition can be unswitched
>>> without
>>> +///   cloning any code from inside the loop. A non-trivial unswitch
>>> requires
>>> +///   code duplication.
>>> +///
>>> +/// - Full unswitching: this is when the branch or switch is completely
>>> moved
>>> +///   from inside the loop to outside the loop. Partial unswitching
>>> removes the
>>> +///   branch from the clone of the loop but must leave a (somewhat
>>> simplified)
>>> +///   branch in the original loop. While theoretically partial
>>> unswitching can
>>> +///   be done for switches, the requirements are extreme - we need the
>>> loop
>>> +///   invariant input to the switch to be sufficient to collapse to a
>>> single
>>> +///   successor in each clone.
>>> +///
>>> +/// This pass always does trivial, full unswitching for both branches
>>> and
>>> +/// switches. For branches, it also always does trivial, partial
>>> unswitching.
>>> +///
>>> +/// If enabled (via the constructor's `NonTrivial` parameter), this
>>> pass will
>>> +/// additionally do non-trivial, full unswitching for branches and
>>> switches, and
>>> +/// will do non-trivial, partial unswitching for branches.
>>> +///
>>> +/// Because partial unswitching of switches is extremely unlikely to be
>>> possible
>>> +/// in practice and significantly complicates the implementation, this
>>> pass does
>>> +/// not currently implement that in any mode.
>>>  class SimpleLoopUnswitchPass : public
>>> PassInfoMixin<SimpleLoopUnswitchPass> {
>>>    bool NonTrivial;
>>>
>>>
>>> Modified: llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp?rev=335553&r1=335552&r2=335553&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp Mon Jun 25
>>> 16:32:54 2018
>>> @@ -715,8 +715,12 @@ static bool unswitchAllTrivialConditions
>>>  ///
>>>  /// This routine handles cloning all of the necessary loop blocks and
>>> exit
>>>  /// blocks including rewriting their instructions and the relevant PHI
>>> nodes.
>>> -/// It skips loop and exit blocks that are not necessary based on the
>>> provided
>>> -/// set. It also correctly creates the unconditional branch in the
>>> cloned
>>> +/// Any loop blocks or exit blocks which are dominated by a different
>>> successor
>>> +/// than the one for this clone of the loop blocks can be trivially
>>> skipped. We
>>> +/// use the `DominatingSucc` map to determine whether a block satisfies
>>> that
>>> +/// property with a simple map lookup.
>>> +///
>>> +/// It also correctly creates the unconditional branch in the cloned
>>>  /// unswitched parent block to only point at the unswitched successor.
>>>  ///
>>>  /// This does not handle most of the necessary updates to `LoopInfo`.
>>> Only exit
>>> @@ -730,7 +734,7 @@ static BasicBlock *buildClonedLoopBlocks
>>>      Loop &L, BasicBlock *LoopPH, BasicBlock *SplitBB,
>>>      ArrayRef<BasicBlock *> ExitBlocks, BasicBlock *ParentBB,
>>>      BasicBlock *UnswitchedSuccBB, BasicBlock *ContinueSuccBB,
>>> -    const SmallPtrSetImpl<BasicBlock *> &SkippedLoopAndExitBlocks,
>>> +    const SmallDenseMap<BasicBlock *, BasicBlock *, 16> &DominatingSucc,
>>>      ValueToValueMapTy &VMap,
>>>      SmallVectorImpl<DominatorTree::UpdateType> &DTUpdates,
>>> AssumptionCache &AC,
>>>      DominatorTree &DT, LoopInfo &LI) {
>>> @@ -751,19 +755,26 @@ static BasicBlock *buildClonedLoopBlocks
>>>      return NewBB;
>>>    };
>>>
>>> +  // We skip cloning blocks when they have a dominating succ that is
>>> not the
>>> +  // succ we are cloning for.
>>> +  auto SkipBlock = [&](BasicBlock *BB) {
>>> +    auto It = DominatingSucc.find(BB);
>>> +    return It != DominatingSucc.end() && It->second != UnswitchedSuccBB;
>>> +  };
>>> +
>>>    // First, clone the preheader.
>>>    auto *ClonedPH = CloneBlock(LoopPH);
>>>
>>>    // Then clone all the loop blocks, skipping the ones that aren't
>>> necessary.
>>>    for (auto *LoopBB : L.blocks())
>>> -    if (!SkippedLoopAndExitBlocks.count(LoopBB))
>>> +    if (!SkipBlock(LoopBB))
>>>        CloneBlock(LoopBB);
>>>
>>>    // Split all the loop exit edges so that when we clone the exit
>>> blocks, if
>>>    // any of the exit blocks are *also* a preheader for some other loop,
>>> we
>>>    // don't create multiple predecessors entering the loop header.
>>>    for (auto *ExitBB : ExitBlocks) {
>>> -    if (SkippedLoopAndExitBlocks.count(ExitBB))
>>> +    if (SkipBlock(ExitBB))
>>>        continue;
>>>
>>>      // When we are going to clone an exit, we don't need to clone all
>>> the
>>> @@ -841,7 +852,7 @@ static BasicBlock *buildClonedLoopBlocks
>>>    // Update any PHI nodes in the cloned successors of the skipped
>>> blocks to not
>>>    // have spurious incoming values.
>>>    for (auto *LoopBB : L.blocks())
>>> -    if (SkippedLoopAndExitBlocks.count(LoopBB))
>>> +    if (SkipBlock(LoopBB))
>>>        for (auto *SuccBB : successors(LoopBB))
>>>          if (auto *ClonedSuccBB =
>>> cast_or_null<BasicBlock>(VMap.lookup(SuccBB)))
>>>            for (PHINode &PN : ClonedSuccBB->phis())
>>> @@ -1175,10 +1186,41 @@ static void buildClonedLoops(Loop &OrigL
>>>  }
>>>
>>>  static void
>>> +deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
>>> +                       ArrayRef<std::unique_ptr<ValueToValueMapTy>>
>>> VMaps,
>>> +                       DominatorTree &DT) {
>>> +  // Find all the dead clones, and remove them from their successors.
>>> +  SmallVector<BasicBlock *, 16> DeadBlocks;
>>> +  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(),
>>> ExitBlocks))
>>> +    for (auto &VMap : VMaps)
>>> +      if (BasicBlock *ClonedBB =
>>> cast_or_null<BasicBlock>(VMap->lookup(BB)))
>>> +        if (!DT.isReachableFromEntry(ClonedBB)) {
>>> +          for (BasicBlock *SuccBB : successors(ClonedBB))
>>> +            SuccBB->removePredecessor(ClonedBB);
>>> +          DeadBlocks.push_back(ClonedBB);
>>> +        }
>>> +
>>> +  // Drop any remaining references to break cycles.
>>> +  for (BasicBlock *BB : DeadBlocks)
>>> +    BB->dropAllReferences();
>>> +  // Erase them from the IR.
>>> +  for (BasicBlock *BB : DeadBlocks)
>>> +    BB->eraseFromParent();
>>> +}
>>> +
>>> +static void
>>>  deleteDeadBlocksFromLoop(Loop &L,
>>> -                         const SmallVectorImpl<BasicBlock *>
>>> &DeadBlocks,
>>>                           SmallVectorImpl<BasicBlock *> &ExitBlocks,
>>>                           DominatorTree &DT, LoopInfo &LI) {
>>> +  // Find all the dead blocks, and remove them from their successors.
>>> +  SmallVector<BasicBlock *, 16> DeadBlocks;
>>> +  for (BasicBlock *BB : llvm::concat<BasicBlock *const>(L.blocks(),
>>> ExitBlocks))
>>> +    if (!DT.isReachableFromEntry(BB)) {
>>> +      for (BasicBlock *SuccBB : successors(BB))
>>> +        SuccBB->removePredecessor(BB);
>>> +      DeadBlocks.push_back(BB);
>>> +    }
>>> +
>>>    SmallPtrSet<BasicBlock *, 16> DeadBlockSet(DeadBlocks.begin(),
>>>                                               DeadBlocks.end());
>>>
>>> @@ -1187,11 +1229,6 @@ deleteDeadBlocksFromLoop(Loop &L,
>>>    llvm::erase_if(ExitBlocks,
>>>                   [&](BasicBlock *BB) { return DeadBlockSet.count(BB);
>>> });
>>>
>>> -  // Remove these blocks from their successors.
>>> -  for (auto *BB : DeadBlocks)
>>> -    for (BasicBlock *SuccBB : successors(BB))
>>> -      SuccBB->removePredecessor(BB, /*DontDeleteUselessPHIs*/ true);
>>> -
>>>    // Walk from this loop up through its parents removing all of the
>>> dead blocks.
>>>    for (Loop *ParentL = &L; ParentL; ParentL = ParentL->getParentLoop())
>>> {
>>>      for (auto *BB : DeadBlocks)
>>> @@ -1582,31 +1619,24 @@ void visitDomSubTree(DominatorTree &DT,
>>>    } while (!DomWorklist.empty());
>>>  }
>>>
>>> -/// Take an invariant branch that has been determined to be safe and
>>> worthwhile
>>> -/// to unswitch despite being non-trivial to do so and perform the
>>> unswitch.
>>> -///
>>> -/// This directly updates the CFG to hoist the predicate out of the
>>> loop, and
>>> -/// clone the necessary parts of the loop to maintain behavior.
>>> -///
>>> -/// It also updates both dominator tree and loopinfo based on the
>>> unswitching.
>>> -///
>>> -/// Once unswitching has been performed it runs the provided callback
>>> to report
>>> -/// the new loops and no-longer valid loops to the caller.
>>> -static bool unswitchInvariantBranch(
>>> -    Loop &L, BranchInst &BI, ArrayRef<Value *> Invariants,
>>> DominatorTree &DT,
>>> -    LoopInfo &LI, AssumptionCache &AC,
>>> +static bool unswitchNontrivialInvariants(
>>> +    Loop &L, TerminatorInst &TI, ArrayRef<Value *> Invariants,
>>> +    DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
>>>      function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB) {
>>> -  auto *ParentBB = BI.getParent();
>>> -
>>> -  // We can only unswitch conditional branches with an invariant
>>> condition or
>>> -  // combining invariant conditions with an instruction.
>>> -  assert(BI.isConditional() && "Can only unswitch a conditional
>>> branch!");
>>> -  bool FullUnswitch = BI.getCondition() == Invariants[0];
>>> +  auto *ParentBB = TI.getParent();
>>> +  BranchInst *BI = dyn_cast<BranchInst>(&TI);
>>> +  SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
>>> +
>>> +  // We can only unswitch switches, conditional branches with an
>>> invariant
>>> +  // condition, or combining invariant conditions with an instruction.
>>> +  assert((SI || BI->isConditional()) &&
>>> +         "Can only unswitch switches and conditional branch!");
>>> +  bool FullUnswitch = SI || BI->getCondition() == Invariants[0];
>>>    if (FullUnswitch)
>>>      assert(Invariants.size() == 1 &&
>>>             "Cannot have other invariants with full unswitching!");
>>>    else
>>> -    assert(isa<Instruction>(BI.getCondition()) &&
>>> +    assert(isa<Instruction>(BI->getCondition()) &&
>>>             "Partial unswitching requires an instruction as the
>>> condition!");
>>>
>>>    // Constant and BBs tracking the cloned and continuing successor.
>>> When we are
>>> @@ -1618,18 +1648,27 @@ static bool unswitchInvariantBranch(
>>>    bool Direction = true;
>>>    int ClonedSucc = 0;
>>>    if (!FullUnswitch) {
>>> -    if (cast<Instruction>(BI.getCondition())->getOpcode() !=
>>> Instruction::Or) {
>>> -      assert(cast<Instruction>(BI.getCondition())->getOpcode() ==
>>> Instruction::And &&
>>> -        "Only `or` and `and` instructions can combine invariants being
>>> unswitched.");
>>> +    if (cast<Instruction>(BI->getCondition())->getOpcode() !=
>>> Instruction::Or) {
>>> +      assert(cast<Instruction>(BI->getCondition())->getOpcode() ==
>>> +                 Instruction::And &&
>>> +             "Only `or` and `and` instructions can combine invariants
>>> being "
>>> +             "unswitched.");
>>>        Direction = false;
>>>        ClonedSucc = 1;
>>>      }
>>>    }
>>> -  auto *UnswitchedSuccBB = BI.getSuccessor(ClonedSucc);
>>> -  auto *ContinueSuccBB = BI.getSuccessor(1 - ClonedSucc);
>>>
>>> -  assert(UnswitchedSuccBB != ContinueSuccBB &&
>>> -         "Should not unswitch a branch that always goes to the same
>>> place!");
>>> +  BasicBlock *RetainedSuccBB =
>>> +      BI ? BI->getSuccessor(1 - ClonedSucc) : SI->getDefaultDest();
>>> +  SmallSetVector<BasicBlock *, 4> UnswitchedSuccBBs;
>>> +  if (BI)
>>> +    UnswitchedSuccBBs.insert(BI->getSuccessor(ClonedSucc));
>>> +  else
>>> +    for (auto Case : SI->cases())
>>> +      UnswitchedSuccBBs.insert(Case.getCaseSuccessor());
>>> +
>>> +  assert(!UnswitchedSuccBBs.count(RetainedSuccBB) &&
>>> +         "Should not unswitch the same successor we are retaining!");
>>>
>>>    // The branch should be in this exact loop. Any inner loop's
>>> invariant branch
>>>    // should be handled by unswitching that inner loop. The caller of
>>> this
>>> @@ -1648,9 +1687,6 @@ static bool unswitchInvariantBranch(
>>>      if (isa<CleanupPadInst>(ExitBB->getFirstNonPHI()))
>>>        return false;
>>>
>>> -  SmallPtrSet<BasicBlock *, 4> ExitBlockSet(ExitBlocks.begin(),
>>> -                                            ExitBlocks.end());
>>> -
>>>    // Compute the parent loop now before we start hacking on things.
>>>    Loop *ParentL = L.getParentLoop();
>>>
>>> @@ -1669,30 +1705,22 @@ static bool unswitchInvariantBranch(
>>>        OuterExitL = NewOuterExitL;
>>>    }
>>>
>>> -  // If the edge we *aren't* cloning in the unswitch (the continuing
>>> edge)
>>> -  // dominates its target, we can skip cloning the dominated region of
>>> the loop
>>> -  // and its exits. We compute this as a set of nodes to be skipped.
>>> -  SmallPtrSet<BasicBlock *, 4> SkippedLoopAndExitBlocks;
>>> -  if (ContinueSuccBB->getUniquePredecessor() ||
>>> -      llvm::all_of(predecessors(ContinueSuccBB), [&](BasicBlock
>>> *PredBB) {
>>> -        return PredBB == ParentBB || DT.dominates(ContinueSuccBB,
>>> PredBB);
>>> -      })) {
>>> -    visitDomSubTree(DT, ContinueSuccBB, [&](BasicBlock *BB) {
>>> -      SkippedLoopAndExitBlocks.insert(BB);
>>> -      return true;
>>> -    });
>>> -  }
>>> -  // If we are doing full unswitching, then similarly to the above, the
>>> edge we
>>> -  // *are* cloning in the unswitch (the unswitched edge) dominates its
>>> target,
>>> -  // we will end up with dead nodes in the original loop and its exits
>>> that will
>>> -  // need to be deleted. Here, we just retain that the property holds
>>> and will
>>> -  // compute the deleted set later.
>>> -  bool DeleteUnswitchedSucc =
>>> -      FullUnswitch &&
>>> -      (UnswitchedSuccBB->getUniquePredecessor() ||
>>> -       llvm::all_of(predecessors(UnswitchedSuccBB), [&](BasicBlock
>>> *PredBB) {
>>> -         return PredBB == ParentBB || DT.dominates(UnswitchedSuccBB,
>>> PredBB);
>>> -       }));
>>> +  // If the edge from this terminator to a successor dominates that
>>> successor,
>>> +  // store a map from each block in its dominator subtree to it. This
>>> lets us
>>> +  // tell when cloning for a particular successor if a block is
>>> dominated by
>>> +  // some *other* successor with a single data structure. We use this to
>>> +  // significantly reduce cloning.
>>> +  SmallDenseMap<BasicBlock *, BasicBlock *, 16> DominatingSucc;
>>> +  for (auto *SuccBB : llvm::concat<BasicBlock *const>(
>>> +           makeArrayRef(RetainedSuccBB), UnswitchedSuccBBs))
>>> +    if (SuccBB->getUniquePredecessor() ||
>>> +        llvm::all_of(predecessors(SuccBB), [&](BasicBlock *PredBB) {
>>> +          return PredBB == ParentBB || DT.dominates(SuccBB, PredBB);
>>> +        }))
>>> +      visitDomSubTree(DT, SuccBB, [&](BasicBlock *BB) {
>>> +        DominatingSucc[BB] = SuccBB;
>>> +        return true;
>>> +      });
>>>
>>>    // Split the preheader, so that we know that there is a safe place to
>>> insert
>>>    // the conditional branch. We will change the preheader to have a
>>> conditional
>>> @@ -1702,84 +1730,93 @@ static bool unswitchInvariantBranch(
>>>    BasicBlock *SplitBB = L.getLoopPreheader();
>>>    BasicBlock *LoopPH = SplitEdge(SplitBB, L.getHeader(), &DT, &LI);
>>>
>>> -  // Keep a mapping for the cloned values.
>>> -  ValueToValueMapTy VMap;
>>> -
>>>    // Keep track of the dominator tree updates needed.
>>>    SmallVector<DominatorTree::UpdateType, 4> DTUpdates;
>>>
>>> -  // Build the cloned blocks from the loop.
>>> -  auto *ClonedPH = buildClonedLoopBlocks(
>>> -      L, LoopPH, SplitBB, ExitBlocks, ParentBB, UnswitchedSuccBB,
>>> -      ContinueSuccBB, SkippedLoopAndExitBlocks, VMap, DTUpdates, AC,
>>> DT, LI);
>>> +  // Clone the loop for each unswitched successor.
>>> +  SmallVector<std::unique_ptr<ValueToValueMapTy>, 4> VMaps;
>>> +  VMaps.reserve(UnswitchedSuccBBs.size());
>>> +  SmallDenseMap<BasicBlock *, BasicBlock *, 4> ClonedPHs;
>>> +  for (auto *SuccBB : UnswitchedSuccBBs) {
>>> +    VMaps.emplace_back(new ValueToValueMapTy());
>>> +    ClonedPHs[SuccBB] = buildClonedLoopBlocks(
>>> +        L, LoopPH, SplitBB, ExitBlocks, ParentBB, SuccBB,
>>> RetainedSuccBB,
>>> +        DominatingSucc, *VMaps.back(), DTUpdates, AC, DT, LI);
>>> +  }
>>>
>>>    // The stitching of the branched code back together depends on
>>> whether we're
>>>    // doing full unswitching or not with the exception that we always
>>> want to
>>>    // nuke the initial terminator placed in the split block.
>>>    SplitBB->getTerminator()->eraseFromParent();
>>>    if (FullUnswitch) {
>>> -    // Remove the parent as a predecessor of the
>>> -    // unswitched successor.
>>> -    UnswitchedSuccBB->removePredecessor(ParentBB,
>>> -                                        /*DontDeleteUselessPHIs*/ true);
>>> -    DTUpdates.push_back({DominatorTree::Delete, ParentBB,
>>> UnswitchedSuccBB});
>>> -
>>> -    // Now splice the branch from the original loop and use it to
>>> select between
>>> -    // the two loops.
>>> -    SplitBB->getInstList().splice(SplitBB->end(),
>>> ParentBB->getInstList(), BI);
>>> -    BI.setSuccessor(ClonedSucc, ClonedPH);
>>> -    BI.setSuccessor(1 - ClonedSucc, LoopPH);
>>> +    for (BasicBlock *SuccBB : UnswitchedSuccBBs) {
>>> +      // Remove the parent as a predecessor of the unswitched successor.
>>> +      SuccBB->removePredecessor(ParentBB,
>>> +                                /*DontDeleteUselessPHIs*/ true);
>>> +      DTUpdates.push_back({DominatorTree::Delete, ParentBB, SuccBB});
>>> +    }
>>> +
>>> +    // Now splice the terminator from the original loop and rewrite its
>>> +    // successors.
>>> +    SplitBB->getInstList().splice(SplitBB->end(),
>>> ParentBB->getInstList(), TI);
>>> +    if (BI) {
>>> +      assert(UnswitchedSuccBBs.size() == 1 &&
>>> +             "Only one possible unswitched block for a branch!");
>>> +      BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>>> +      BI->setSuccessor(ClonedSucc, ClonedPH);
>>> +      BI->setSuccessor(1 - ClonedSucc, LoopPH);
>>> +      DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>>> +    } else {
>>> +      assert(SI && "Must either be a branch or switch!");
>>> +
>>> +      // Walk the cases and directly update their successors.
>>> +      for (auto &Case : SI->cases())
>>> +
>>> Case.setSuccessor(ClonedPHs.find(Case.getCaseSuccessor())->second);
>>> +      // We need to use the set to populate domtree updates as even
>>> when there
>>> +      // are multiple cases pointing at the same successor we only want
>>> to
>>> +      // insert one edge in the domtree.
>>> +      for (BasicBlock *SuccBB : UnswitchedSuccBBs)
>>> +        DTUpdates.push_back(
>>> +            {DominatorTree::Insert, SplitBB,
>>> ClonedPHs.find(SuccBB)->second});
>>> +
>>> +      SI->setDefaultDest(LoopPH);
>>> +    }
>>>
>>>      // Create a new unconditional branch to the continuing block (as
>>> opposed to
>>>      // the one cloned).
>>> -    BranchInst::Create(ContinueSuccBB, ParentBB);
>>> +    BranchInst::Create(RetainedSuccBB, ParentBB);
>>>    } else {
>>> +    assert(BI && "Only branches have partial unswitching.");
>>> +    assert(UnswitchedSuccBBs.size() == 1 &&
>>> +           "Only one possible unswitched block for a branch!");
>>> +    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>>>      // When doing a partial unswitch, we have to do a bit more work to
>>> build up
>>>      // the branch in the split block.
>>>      buildPartialUnswitchConditionalBranch(*SplitBB, Invariants,
>>> Direction,
>>>                                            *ClonedPH, *LoopPH);
>>> +    DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>>>    }
>>>
>>> -  // Before we update the dominator tree, collect the dead blocks if
>>> we're going
>>> -  // to end up deleting the unswitched successor.
>>> -  SmallVector<BasicBlock *, 16> DeadBlocks;
>>> -  if (DeleteUnswitchedSucc) {
>>> -    DeadBlocks.push_back(UnswitchedSuccBB);
>>> -    for (int i = 0; i < (int)DeadBlocks.size(); ++i) {
>>> -      // If we reach an exit block, stop recursing as the unswitched
>>> loop will
>>> -      // end up reaching the merge block which we make the successor of
>>> the
>>> -      // exit.
>>> -      if (ExitBlockSet.count(DeadBlocks[i]))
>>> -        continue;
>>> -
>>> -      // Insert the children that are within the loop or exit block
>>> set. Other
>>> -      // children may reach out of the loop. While we don't expect
>>> these to be
>>> -      // dead (as the unswitched clone should reach them) we don't try
>>> to prove
>>> -      // that here.
>>> -      for (DomTreeNode *ChildN : *DT[DeadBlocks[i]])
>>> -        if (L.contains(ChildN->getBlock()) ||
>>> -            ExitBlockSet.count(ChildN->getBlock()))
>>> -          DeadBlocks.push_back(ChildN->getBlock());
>>> -    }
>>> -  }
>>> -
>>> -  // Add the remaining edge to our updates and apply them to get an
>>> up-to-date
>>> -  // dominator tree. Note that this will cause the dead blocks above to
>>> be
>>> -  // unreachable and no longer in the dominator tree.
>>> -  DTUpdates.push_back({DominatorTree::Insert, SplitBB, ClonedPH});
>>> +  // Apply the updates accumulated above to get an up-to-date dominator
>>> tree.
>>>    DT.applyUpdates(DTUpdates);
>>>
>>> +  // Now that we have an accurate dominator tree, first delete the dead
>>> cloned
>>> +  // blocks so that we can accurately build any cloned loops. It is
>>> important to
>>> +  // not delete the blocks from the original loop yet because we still
>>> want to
>>> +  // reference the original loop to understand the cloned loop's
>>> structure.
>>> +  deleteDeadClonedBlocks(L, ExitBlocks, VMaps, DT);
>>> +
>>>    // Build the cloned loop structure itself. This may be substantially
>>>    // different from the original structure due to the simplified CFG.
>>> This also
>>>    // handles inserting all the cloned blocks into the correct loops.
>>>    SmallVector<Loop *, 4> NonChildClonedLoops;
>>> -  buildClonedLoops(L, ExitBlocks, VMap, LI, NonChildClonedLoops);
>>> -
>>> -  // Delete anything that was made dead in the original loop due to
>>> -  // unswitching.
>>> -  if (!DeadBlocks.empty())
>>> -    deleteDeadBlocksFromLoop(L, DeadBlocks, ExitBlocks, DT, LI);
>>> +  for (std::unique_ptr<ValueToValueMapTy> &VMap : VMaps)
>>> +    buildClonedLoops(L, ExitBlocks, *VMap, LI, NonChildClonedLoops);
>>>
>>> +  // Now that our cloned loops have been built, we can update the
>>> original loop.
>>> +  // First we delete the dead blocks from it and then we rebuild the
>>> loop
>>> +  // structure taking these deletions into account.
>>> +  deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI);
>>>    SmallVector<Loop *, 4> HoistedLoops;
>>>    bool IsStillLoop = rebuildLoopAfterUnswitch(L, ExitBlocks, LI,
>>> HoistedLoops);
>>>
>>> @@ -1790,31 +1827,37 @@ static bool unswitchInvariantBranch(
>>>    // verification steps.
>>>    assert(DT.verify(DominatorTree::VerificationLevel::Fast));
>>>
>>> -  // Now we want to replace all the uses of the invariants within both
>>> the
>>> -  // original and cloned blocks. We do this here so that we can use the
>>> now
>>> -  // updated dominator tree to identify which side the users are on.
>>> -  ConstantInt *UnswitchedReplacement =
>>> -      Direction ? ConstantInt::getTrue(BI.getContext())
>>> -                : ConstantInt::getFalse(BI.getContext());
>>> -  ConstantInt *ContinueReplacement =
>>> -      Direction ? ConstantInt::getFalse(BI.getContext())
>>> -                : ConstantInt::getTrue(BI.getContext());
>>> -  for (Value *Invariant : Invariants)
>>> -    for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
>>> -         UI != UE;) {
>>> -      // Grab the use and walk past it so we can clobber it in the use
>>> list.
>>> -      Use *U = &*UI++;
>>> -      Instruction *UserI = dyn_cast<Instruction>(U->getUser());
>>> -      if (!UserI)
>>> -        continue;
>>> +  if (BI) {
>>> +    // If we unswitched a branch which collapses the condition to a
>>> known
>>> +    // constant we want to replace all the uses of the invariants
>>> within both
>>> +    // the original and cloned blocks. We do this here so that we can
>>> use the
>>> +    // now updated dominator tree to identify which side the users are
>>> on.
>>> +    assert(UnswitchedSuccBBs.size() == 1 &&
>>> +           "Only one possible unswitched block for a branch!");
>>> +    BasicBlock *ClonedPH = ClonedPHs.begin()->second;
>>> +    ConstantInt *UnswitchedReplacement =
>>> +        Direction ? ConstantInt::getTrue(BI->getContext())
>>> +                  : ConstantInt::getFalse(BI->getContext());
>>> +    ConstantInt *ContinueReplacement =
>>> +        Direction ? ConstantInt::getFalse(BI->getContext())
>>> +                  : ConstantInt::getTrue(BI->getContext());
>>> +    for (Value *Invariant : Invariants)
>>> +      for (auto UI = Invariant->use_begin(), UE = Invariant->use_end();
>>> +           UI != UE;) {
>>> +        // Grab the use and walk past it so we can clobber it in the
>>> use list.
>>> +        Use *U = &*UI++;
>>> +        Instruction *UserI = dyn_cast<Instruction>(U->getUser());
>>> +        if (!UserI)
>>> +          continue;
>>>
>>> -      // Replace it with the 'continue' side if in the main loop body,
>>> and the
>>> -      // unswitched if in the cloned blocks.
>>> -      if (DT.dominates(LoopPH, UserI->getParent()))
>>> -        U->set(ContinueReplacement);
>>> -      else if (DT.dominates(ClonedPH, UserI->getParent()))
>>> -        U->set(UnswitchedReplacement);
>>> -    }
>>> +        // Replace it with the 'continue' side if in the main loop
>>> body, and the
>>> +        // unswitched if in the cloned blocks.
>>> +        if (DT.dominates(LoopPH, UserI->getParent()))
>>> +          U->set(ContinueReplacement);
>>> +        else if (DT.dominates(ClonedPH, UserI->getParent()))
>>> +          U->set(UnswitchedReplacement);
>>> +      }
>>> +  }
>>>
>>>    // We can change which blocks are exit blocks of all the cloned
>>> sibling
>>>    // loops, the current loop, and any parent loops which shared exit
>>> blocks
>>> @@ -1937,8 +1980,16 @@ static bool unswitchBestCondition(
>>>      if (LI.getLoopFor(BB) != &L)
>>>        continue;
>>>
>>> +    if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
>>> +      // We can only consider fully loop-invariant switch conditions as
>>> we need
>>> +      // to completely eliminate the switch after unswitching.
>>> +      if (!isa<Constant>(SI->getCondition()) &&
>>> +          L.isLoopInvariant(SI->getCondition()))
>>> +        UnswitchCandidates.push_back({SI, {SI->getCondition()}});
>>> +      continue;
>>> +    }
>>> +
>>>      auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
>>> -    // FIXME: Handle switches here!
>>>      if (!BI || !BI->isConditional() ||
>>> isa<Constant>(BI->getCondition()) ||
>>>          BI->getSuccessor(0) == BI->getSuccessor(1))
>>>        continue;
>>> @@ -2091,9 +2142,9 @@ static bool unswitchBestCondition(
>>>      TerminatorInst &TI = *TerminatorAndInvariants.first;
>>>      ArrayRef<Value *> Invariants = TerminatorAndInvariants.second;
>>>      BranchInst *BI = dyn_cast<BranchInst>(&TI);
>>> -    int CandidateCost =
>>> -        ComputeUnswitchedCost(TI, /*FullUnswitch*/ Invariants.size() ==
>>> 1 && BI &&
>>> -                                      Invariants[0] ==
>>> BI->getCondition());
>>> +    int CandidateCost = ComputeUnswitchedCost(
>>> +        TI, /*FullUnswitch*/ !BI || (Invariants.size() == 1 &&
>>> +                                     Invariants[0] ==
>>> BI->getCondition()));
>>>      LLVM_DEBUG(dbgs() << "  Computed cost of " << CandidateCost
>>>                        << " for unswitch candidate: " << TI << "\n");
>>>      if (!BestUnswitchTI || CandidateCost < BestUnswitchCost) {
>>> @@ -2109,17 +2160,11 @@ static bool unswitchBestCondition(
>>>      return false;
>>>    }
>>>
>>> -  auto *UnswitchBI = dyn_cast<BranchInst>(BestUnswitchTI);
>>> -  if (!UnswitchBI) {
>>> -    // FIXME: Add support for unswitching a switch here!
>>> -    LLVM_DEBUG(dbgs() << "Cannot unswitch anything but a branch!\n");
>>> -    return false;
>>> -  }
>>> -
>>>    LLVM_DEBUG(dbgs() << "  Trying to unswitch non-trivial (cost = "
>>> -                    << BestUnswitchCost << ") branch: " << *UnswitchBI
>>> << "\n");
>>> -  return unswitchInvariantBranch(L, *UnswitchBI,
>>> BestUnswitchInvariants, DT, LI,
>>> -                                 AC, UnswitchCB);
>>> +                    << BestUnswitchCost << ") terminator: " <<
>>> *BestUnswitchTI
>>> +                    << "\n");
>>> +  return unswitchNontrivialInvariants(
>>> +      L, *BestUnswitchTI, BestUnswitchInvariants, DT, LI, AC,
>>> UnswitchCB);
>>>  }
>>>
>>>  /// Unswitch control flow predicated on loop invariant conditions.
>>>
>>> Modified:
>>> llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll?rev=335553&r1=335552&r2=335553&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>>> (original)
>>> +++ llvm/trunk/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch.ll
>>> Mon Jun 25 16:32:54 2018
>>> @@ -387,7 +387,7 @@ loop_begin:
>>>  loop_b:
>>>    %b = load i32, i32* %b.ptr
>>>    br i1 %v, label %loop_begin, label %loop_exit
>>> -; The 'loop_b' unswitched loop.
>>> +; The original loop, now non-looping due to unswitching..
>>>  ;
>>>  ; CHECK:       entry.split:
>>>  ; CHECK-NEXT:    br label %loop_begin
>>> @@ -398,14 +398,13 @@ loop_b:
>>>  ; CHECK-NEXT:    br label %loop_exit.split
>>>  ;
>>>  ; CHECK:       loop_exit.split:
>>> -; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A]], %loop_begin ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>
>>>  loop_exit:
>>>    %ab.phi = phi i32 [ %b, %loop_b ], [ %a, %loop_begin ]
>>>    ret i32 %ab.phi
>>>  ; CHECK:       loop_exit:
>>> -; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>>> %loop_exit.split ], [ %[[B_LCSSA]], %loop_exit.split.us ]
>>> +; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A]], %loop_exit.split ],
>>> [ %[[B_LCSSA]], %loop_exit.split.us ]
>>>  ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>>  }
>>>
>>> @@ -458,8 +457,7 @@ loop_exit1:
>>>    call void @sink1(i32 %a.phi)
>>>    ret void
>>>  ; CHECK:       loop_exit1:
>>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>>> loop_exit1.split.us ]
>>> -; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
>>> +; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
>>>  ; CHECK-NEXT:    ret void
>>>
>>>  loop_exit2:
>>> @@ -467,8 +465,8 @@ loop_exit2:
>>>    call void @sink2(i32 %b.phi)
>>>    ret void
>>>  ; CHECK:       loop_exit2:
>>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
>>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
>>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>>  ; CHECK-NEXT:    ret void
>>>  }
>>>
>>> @@ -531,8 +529,7 @@ loop_exit2:
>>>    call void @sink2(i32 %b.phi)
>>>    ret void
>>>  ; CHECK:       loop_exit2:
>>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %
>>> loop_exit2.split.us ]
>>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>>  ; CHECK-NEXT:    ret void
>>>  }
>>>
>>> @@ -587,8 +584,7 @@ loop_exit1:
>>>    call void @sink1(i32 %a.phi)
>>>    br label %exit
>>>  ; CHECK:       loop_exit1:
>>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>>> loop_exit1.split.us ]
>>> -; CHECK-NEXT:    call void @sink1(i32 %[[A_PHI]])
>>> +; CHECK-NEXT:    call void @sink1(i32 %[[A_LCSSA]])
>>>  ; CHECK-NEXT:    br label %exit
>>>
>>>  loop_exit2:
>>> @@ -596,8 +592,8 @@ loop_exit2:
>>>    call void @sink2(i32 %b.phi)
>>>    br label %exit
>>>  ; CHECK:       loop_exit2:
>>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B]], %loop_b ]
>>> -; CHECK-NEXT:    call void @sink2(i32 %[[B_PHI]])
>>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %loop_b ]
>>> +; CHECK-NEXT:    call void @sink2(i32 %[[B_LCSSA]])
>>>  ; CHECK-NEXT:    br label %exit
>>>
>>>  exit:
>>> @@ -663,7 +659,7 @@ loop_latch:
>>>    %v2 = load i1, i1* %ptr
>>>    br i1 %v2, label %loop_begin, label %loop_exit
>>>  ; CHECK:       loop_latch:
>>> -; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B]], %inner_loop_b ]
>>> +; CHECK-NEXT:    %[[B_INNER_LCSSA:.*]] = phi i32 [ %[[B]],
>>> %inner_loop_b ]
>>>  ; CHECK-NEXT:    %[[V2:.*]] = load i1, i1* %ptr
>>>  ; CHECK-NEXT:    br i1 %[[V2]], label %loop_begin, label
>>> %loop_exit.loopexit1
>>>
>>> @@ -671,15 +667,14 @@ loop_exit:
>>>    %ab.phi = phi i32 [ %a, %inner_loop_begin ], [ %b.phi, %loop_latch ]
>>>    ret i32 %ab.phi
>>>  ; CHECK:       loop_exit.loopexit:
>>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]], %
>>> loop_exit.loopexit.split.us ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit.loopexit1:
>>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_LCSSA]], %loop_latch ]
>>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]],
>>> %loop_latch ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit:
>>> -; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_PHI]],
>>> %loop_exit.loopexit ], [ %[[B_PHI]], %loop_exit.loopexit1 ]
>>> +; CHECK-NEXT:    %[[AB_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>>> %loop_exit.loopexit ], [ %[[B_LCSSA]], %loop_exit.loopexit1 ]
>>>  ; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>>  }
>>>
>>> @@ -773,11 +768,10 @@ latch:
>>>  ; CHECK-NEXT:    br label %latch
>>>  ;
>>>  ; CHECK:       latch:
>>> -; CHECK-NEXT:    %[[B_PHI:.*]] = phi i32 [ %[[B_INNER_LCSSA]],
>>> %loop_b_inner_exit ]
>>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit.split
>>>  ;
>>>  ; CHECK:       loop_exit.split:
>>> -; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_PHI]], %latch ]
>>> +; CHECK-NEXT:    %[[B_LCSSA:.*]] = phi i32 [ %[[B_INNER_LCSSA]], %latch
>>> ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>
>>>  loop_exit:
>>> @@ -1466,7 +1460,6 @@ inner_loop_exit:
>>>    %v = load i1, i1* %ptr
>>>    br i1 %v, label %loop_begin, label %loop_exit
>>>  ; CHECK:       inner_loop_exit:
>>> -; CHECK-NEXT:    %[[A_INNER_LCSSA:.*]] = phi i32 [
>>> %[[A_INNER_LCSSA_US]], %inner_loop_exit.split.us ]
>>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
>>>
>>> @@ -1474,7 +1467,7 @@ loop_exit:
>>>    %a.lcssa = phi i32 [ %a.inner_lcssa, %inner_loop_exit ]
>>>    ret i32 %a.lcssa
>>>  ; CHECK:       loop_exit:
>>> -; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA]],
>>> %inner_loop_exit ]
>>> +; CHECK-NEXT:    %[[A_LCSSA:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>>> %inner_loop_exit ]
>>>  ; CHECK-NEXT:    ret i32 %[[A_LCSSA]]
>>>  }
>>>
>>> @@ -1555,7 +1548,7 @@ loop_exit:
>>>    ret i32 %a.lcssa
>>>  ; CHECK:       loop_exit:
>>>  ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA]],
>>> %loop_exit.split ], [ %[[A_PHI_US]], %loop_exit.split.us ]
>>> -; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>> +; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>>  }
>>>
>>>  ; Test that requires re-forming dedicated exits for the original loop.
>>> @@ -1635,7 +1628,7 @@ loop_exit:
>>>    ret i32 %a.lcssa
>>>  ; CHECK:       loop_exit:
>>>  ; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_PHI_SPLIT]],
>>> %loop_exit.split ], [ %[[A_LCSSA_US]], %loop_exit.split.us ]
>>> -; CHECK-NEXT:    ret i32 %[[AB_PHI]]
>>> +; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>>  }
>>>
>>>  ; Check that if a cloned inner loop after unswitching doesn't loop and
>>> directly
>>> @@ -1721,7 +1714,6 @@ loop_exit:
>>>    %a.lcssa = phi i32 [ %a, %inner_loop_begin ], [ %a.inner_lcssa,
>>> %inner_loop_exit ]
>>>    ret i32 %a.lcssa
>>>  ; CHECK:       loop_exit.loopexit:
>>> -; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]], %
>>> loop_exit.loopexit.split.us ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit.loopexit1:
>>> @@ -1729,7 +1721,7 @@ loop_exit:
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit:
>>> -; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_LCSSA_US]],
>>> %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
>>> +; CHECK-NEXT:    %[[A_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>>> %loop_exit.loopexit ], [ %[[A_LCSSA]], %loop_exit.loopexit1 ]
>>>  ; CHECK-NEXT:    ret i32 %[[A_PHI]]
>>>  }
>>>
>>> @@ -1802,7 +1794,6 @@ inner_loop_exit:
>>>    %v3 = load i1, i1* %ptr
>>>    br i1 %v3, label %loop_latch, label %loop_exit
>>>  ; CHECK:       inner_loop_exit:
>>> -; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>>> %inner_loop_exit.split.us ]
>>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_latch, label
>>> %loop_exit.loopexit1
>>>
>>> @@ -1819,7 +1810,7 @@ loop_exit:
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit.loopexit1:
>>> -; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_PHI]],
>>> %inner_loop_exit ]
>>> +; CHECK-NEXT:    %[[A_LCSSA_US:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>>> %inner_loop_exit ]
>>>  ; CHECK-NEXT:    br label %loop_exit
>>>  ;
>>>  ; CHECK:       loop_exit:
>>> @@ -1916,7 +1907,6 @@ inner_loop_exit:
>>>    %v4 = load i1, i1* %ptr
>>>    br i1 %v4, label %loop_begin, label %loop_exit
>>>  ; CHECK:       inner_loop_exit.loopexit:
>>> -; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>>> %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit.split.us ]
>>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>>  ;
>>>  ; CHECK:       inner_loop_exit.loopexit1:
>>> @@ -1924,7 +1914,7 @@ inner_loop_exit:
>>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>>  ;
>>>  ; CHECK:       inner_loop_exit:
>>> -; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [ %[[A_INNER_LCSSA_US]],
>>> %inner_loop_exit.loopexit ], [ %[[A_INNER_LCSSA]],
>>> %inner_loop_exit.loopexit1 ]
>>> +; CHECK-NEXT:    %[[A_INNER_PHI:.*]] = phi i32 [
>>> %[[A_INNER_INNER_LCSSA_US]], %inner_loop_exit.loopexit ], [
>>> %[[A_INNER_LCSSA]], %inner_loop_exit.loopexit1 ]
>>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>>  ; CHECK-NEXT:    br i1 %[[V]], label %loop_begin, label %loop_exit
>>>
>>> @@ -2010,7 +2000,6 @@ inner_inner_loop_exit:
>>>    %v3 = load i1, i1* %ptr
>>>    br i1 %v3, label %inner_loop_latch, label %inner_loop_exit
>>>  ; CHECK:       inner_inner_loop_exit:
>>> -; CHECK-NEXT:    %[[A_INNER_INNER_PHI:.*]] = phi i32 [
>>> %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit.split.us ]
>>>  ; CHECK-NEXT:    %[[V:.*]] = load i1, i1* %ptr
>>>  ; CHECK-NEXT:    br i1 %[[V]], label %inner_loop_latch, label
>>> %inner_loop_exit.loopexit1
>>>
>>> @@ -2028,7 +2017,7 @@ inner_loop_exit:
>>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>>  ;
>>>  ; CHECK:       inner_loop_exit.loopexit1:
>>> -; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>>> %[[A_INNER_INNER_PHI]], %inner_inner_loop_exit ]
>>> +; CHECK-NEXT:    %[[A_INNER_LCSSA_US:.*]] = phi i32 [
>>> %[[A_INNER_INNER_LCSSA_US]], %inner_inner_loop_exit ]
>>>  ; CHECK-NEXT:    br label %inner_loop_exit
>>>  ;
>>>  ; CHECK:       inner_loop_exit:
>>> @@ -2296,56 +2285,96 @@ define i32 @test20(i32* %var, i32 %cond1
>>>  entry:
>>>    br label %loop_begin
>>>  ; CHECK-NEXT:  entry:
>>> -; CHECK-NEXT:    br label %loop_begin
>>> +; CHECK-NEXT:    switch i32 %cond2, label %[[ENTRY_SPLIT_EXIT:.*]] [
>>> +; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
>>> +; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_A]]
>>> +; CHECK-NEXT:      i32 13, label %[[ENTRY_SPLIT_B:.*]]
>>> +; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_A]]
>>> +; CHECK-NEXT:      i32 42, label %[[ENTRY_SPLIT_C:.*]]
>>> +; CHECK-NEXT:    ]
>>>
>>>  loop_begin:
>>>    %var_val = load i32, i32* %var
>>> -  switch i32 %cond2, label %loop_a [
>>> -    i32 0, label %loop_b
>>> -    i32 1, label %loop_b
>>> -    i32 13, label %loop_c
>>> -    i32 2, label %loop_b
>>> -    i32 42, label %loop_exit
>>> +  switch i32 %cond2, label %loop_exit [
>>> +    i32 0, label %loop_a
>>> +    i32 1, label %loop_a
>>> +    i32 13, label %loop_b
>>> +    i32 2, label %loop_a
>>> +    i32 42, label %loop_c
>>>    ]
>>> -; CHECK:       loop_begin:
>>> -; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
>>> -; CHECK-NEXT:    switch i32 %cond2, label %loop_a [
>>> -; CHECK-NEXT:      i32 0, label %loop_b
>>> -; CHECK-NEXT:      i32 1, label %loop_b
>>> -; CHECK-NEXT:      i32 13, label %loop_c
>>> -; CHECK-NEXT:      i32 2, label %loop_b
>>> -; CHECK-NEXT:      i32 42, label %loop_exit
>>> -; CHECK-NEXT:    ]
>>>
>>>  loop_a:
>>>    call void @a()
>>>    br label %loop_latch
>>> -; CHECK:       loop_a:
>>> +; Unswitched 'a' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_A]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_A]]:
>>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>>> +; CHECK-NEXT:    br label %[[LOOP_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_A]]:
>>>  ; CHECK-NEXT:    call void @a()
>>> -; CHECK-NEXT:    br label %loop_latch
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_A]]:
>>> +; CHECK:         br label %[[LOOP_BEGIN_A]]
>>>
>>>  loop_b:
>>>    call void @b()
>>>    br label %loop_latch
>>> -; CHECK:       loop_b:
>>> +; Unswitched 'b' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_B]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_B]]:
>>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>>> +; CHECK-NEXT:    br label %[[LOOP_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_B]]:
>>>  ; CHECK-NEXT:    call void @b()
>>> -; CHECK-NEXT:    br label %loop_latch
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_B]]:
>>> +; CHECK:         br label %[[LOOP_BEGIN_B]]
>>>
>>>  loop_c:
>>>    call void @c() noreturn nounwind
>>>    br label %loop_latch
>>> -; CHECK:       loop_c:
>>> +; Unswitched 'c' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_C]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_C]]:
>>> +; CHECK-NEXT:    %{{.*}} = load i32, i32* %var
>>> +; CHECK-NEXT:    br label %[[LOOP_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_C]]:
>>>  ; CHECK-NEXT:    call void @c()
>>> -; CHECK-NEXT:    br label %loop_latch
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_C]]:
>>> +; CHECK:         br label %[[LOOP_BEGIN_C]]
>>>
>>>  loop_latch:
>>>    br label %loop_begin
>>> -; CHECK:       loop_latch:
>>> -; CHECK-NEXT:    br label %loop_begin
>>>
>>>  loop_exit:
>>>    %lcssa = phi i32 [ %var_val, %loop_begin ]
>>>    ret i32 %lcssa
>>> +; Unswitched exit edge (no longer a loop).
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_EXIT]]:
>>> +; CHECK-NEXT:    br label %loop_begin
>>> +;
>>> +; CHECK:       loop_begin:
>>> +; CHECK-NEXT:    %[[V:.*]] = load i32, i32* %var
>>> +; CHECK-NEXT:    br label %loop_exit
>>> +;
>>>  ; CHECK:       loop_exit:
>>>  ; CHECK-NEXT:    %[[LCSSA:.*]] = phi i32 [ %[[V]], %loop_begin ]
>>>  ; CHECK-NEXT:    ret i32 %[[LCSSA]]
>>> @@ -2824,3 +2853,112 @@ loop_exit:
>>>  ; CHECK:       loop_exit:
>>>  ; CHECK-NEXT:    ret
>>>  }
>>> +
>>> +; Non-trivial unswitching of a switch.
>>> +define i32 @test27(i1* %ptr, i32 %cond) {
>>> +; CHECK-LABEL: @test27(
>>> +entry:
>>> +  br label %loop_begin
>>> +; CHECK-NEXT:  entry:
>>> +; CHECK-NEXT:    switch i32 %cond, label %[[ENTRY_SPLIT_LATCH:.*]] [
>>> +; CHECK-NEXT:      i32 0, label %[[ENTRY_SPLIT_A:.*]]
>>> +; CHECK-NEXT:      i32 1, label %[[ENTRY_SPLIT_B:.*]]
>>> +; CHECK-NEXT:      i32 2, label %[[ENTRY_SPLIT_C:.*]]
>>> +; CHECK-NEXT:    ]
>>> +
>>> +loop_begin:
>>> +  switch i32 %cond, label %latch [
>>> +    i32 0, label %loop_a
>>> +    i32 1, label %loop_b
>>> +    i32 2, label %loop_c
>>> +  ]
>>> +
>>> +loop_a:
>>> +  call void @a()
>>> +  br label %latch
>>> +; Unswitched 'a' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_A]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_A]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_A]]:
>>> +; CHECK-NEXT:    call void @a()
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_A]]:
>>> +; CHECK-NEXT:    %[[V_A:.*]] = load i1, i1* %ptr
>>> +; CHECK:         br i1 %[[V_A]], label %[[LOOP_BEGIN_A]], label
>>> %[[LOOP_EXIT_A:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_EXIT_A]]:
>>> +; CHECK-NEXT:    br label %loop_exit
>>> +
>>> +loop_b:
>>> +  call void @b()
>>> +  br label %latch
>>> +; Unswitched 'b' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_B]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_B]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_B]]:
>>> +; CHECK-NEXT:    call void @b()
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_B]]:
>>> +; CHECK-NEXT:    %[[V_B:.*]] = load i1, i1* %ptr
>>> +; CHECK:         br i1 %[[V_B]], label %[[LOOP_BEGIN_B]], label
>>> %[[LOOP_EXIT_B:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_EXIT_B]]:
>>> +; CHECK-NEXT:    br label %loop_exit
>>> +
>>> +loop_c:
>>> +  call void @c()
>>> +  br label %latch
>>> +; Unswitched 'c' loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_C]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_C]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_C]]:
>>> +; CHECK-NEXT:    call void @c()
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_C]]:
>>> +; CHECK-NEXT:    %[[V_C:.*]] = load i1, i1* %ptr
>>> +; CHECK:         br i1 %[[V_C]], label %[[LOOP_BEGIN_C]], label
>>> %[[LOOP_EXIT_C:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_EXIT_C]]:
>>> +; CHECK-NEXT:    br label %loop_exit
>>> +
>>> +latch:
>>> +  %v = load i1, i1* %ptr
>>> +  br i1 %v, label %loop_begin, label %loop_exit
>>> +; Unswitched the 'latch' only loop.
>>> +;
>>> +; CHECK:       [[ENTRY_SPLIT_LATCH]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_BEGIN_LATCH:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_BEGIN_LATCH]]:
>>> +; CHECK-NEXT:    br label %[[LOOP_LATCH_LATCH:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_LATCH_LATCH]]:
>>> +; CHECK-NEXT:    %[[V_LATCH:.*]] = load i1, i1* %ptr
>>> +; CHECK:         br i1 %[[V_LATCH]], label %[[LOOP_BEGIN_LATCH]], label
>>> %[[LOOP_EXIT_LATCH:.*]]
>>> +;
>>> +; CHECK:       [[LOOP_EXIT_LATCH]]:
>>> +; CHECK-NEXT:    br label %loop_exit
>>> +
>>> +loop_exit:
>>> +  ret i32 0
>>> +; CHECK:       loop_exit:
>>> +; CHECK-NEXT:    ret i32 0
>>> +}
>>> \ No newline at end of file
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180709/38099638/attachment.html>


More information about the llvm-commits mailing list