[llvm] [LoopRotate] Remove canRotateDeoptimizingLatchExit and multi-rotate (PR #162482)

Wed Oct 8 06:35:17 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Marek Sedláček (mark-sed)

<details>
<summary>Changes</summary>

After further investigation of improvements to runtime unrolling and to it connected loop rotation (with my attempts in https://github.com/llvm/llvm-project/pull/146540 and https://github.com/llvm/llvm-project/pull/148243) I have looked more into loop rotation and the existing profitability checks.

The `canRotateDeoptimizingLatchExit` check does not seem to work as intended. It was introduced in https://gitlab.azulsystems.com/dev/orca/-/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3 for the loop-rotate-multi-flag, where this flag was disabled and there was an intent to enable it by default later (https://reviews.llvm.org/D73058) and this has not happen even 5 years later.

I have done multiple experiments in our downstream with multi-rotate and with this check. We suggest removal of this heuristic and multi-rotate as well.

Note that the diff is big, but it's just removal of while loop and indentation change.

After this patch I would like to continue here and propose adding a computability check for exit count, but that will be in a separate PR.

Requests for review: @annamthomas @fhahn @davemgreen 

---

Patch is 61.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162482.diff


3 Files Affected:

- (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+437-504) 
- (removed) llvm/test/Transforms/LoopRotate/multiple-deopt-exits.ll (-164) 
- (removed) llvm/test/Transforms/LoopRotate/multiple-exits.ll (-236) 


``````````diff

diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 7cc9ff8b11139..f0f7bbb64da32 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -45,12 +45,6 @@ STATISTIC(NumInstrsHoisted,
           "Number of instructions hoisted into loop preheader");
 STATISTIC(NumInstrsDuplicated,
           "Number of instructions cloned into loop preheader");
-STATISTIC(NumRotated, "Number of loops rotated");
-
-static cl::opt<bool>
-    MultiRotate("loop-rotate-multi", cl::init(false), cl::Hidden,
-                cl::desc("Allow loop rotation multiple times in order to reach "
-                         "a better latch exit"));
 
 // Probability that a rotated loop has zero trip count / is never entered.
 static constexpr uint32_t ZeroTripCountWeights[] = {1, 127};
@@ -206,50 +200,6 @@ static bool profitableToRotateLoopExitingLatch(Loop *L) {
   return false;
 }
 
-// Check that latch exit is deoptimizing (which means - very unlikely to happen)
-// and there is another exit from the loop which is non-deoptimizing.
-// If we rotate latch to that exit our loop has a better chance of being fully
-// canonical.
-//
-// It can give false positives in some rare cases.
-static bool canRotateDeoptimizingLatchExit(Loop *L) {
-  BasicBlock *Latch = L->getLoopLatch();
-  assert(Latch && "need latch");
-  BranchInst *BI = dyn_cast<BranchInst>(Latch->getTerminator());
-  // Need normal exiting latch.
-  if (!BI || !BI->isConditional())
-    return false;
-
-  BasicBlock *Exit = BI->getSuccessor(1);
-  if (L->contains(Exit))
-    Exit = BI->getSuccessor(0);
-
-  // Latch exit is non-deoptimizing, no need to rotate.
-  if (!Exit->getPostdominatingDeoptimizeCall())
-    return false;
-
-  SmallVector<BasicBlock *, 4> Exits;
-  L->getUniqueExitBlocks(Exits);
-  if (!Exits.empty()) {
-    // There is at least one non-deoptimizing exit.
-    //
-    // Note, that BasicBlock::getPostdominatingDeoptimizeCall is not exact,
-    // as it can conservatively return false for deoptimizing exits with
-    // complex enough control flow down to deoptimize call.
-    //
-    // That means here we can report success for a case where
-    // all exits are deoptimizing but one of them has complex enough
-    // control flow (e.g. with loops).
-    //
-    // That should be a very rare case and false positives for this function
-    // have compile-time effect only.
-    return any_of(Exits, [](const BasicBlock *BB) {
-      return !BB->getPostdominatingDeoptimizeCall();
-    });
-  }
-  return false;
-}
-
 static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
                                 bool HasConditionalPreHeader,
                                 bool SuccsSwapped) {
@@ -387,506 +337,489 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
 /// rotation. LoopRotate should be repeatable and converge to a canonical
 /// form. This property is satisfied because simplifying the loop latch can only
 /// happen once across multiple invocations of the LoopRotate pass.
-///
-/// If -loop-rotate-multi is enabled we can do multiple rotations in one go
-/// so to reach a suitable (non-deoptimizing) exit.
 bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
   // If the loop has only one block then there is not much to rotate.
   if (L->getBlocks().size() == 1)
     return false;
 
   bool Rotated = false;
-  do {
-    BasicBlock *OrigHeader = L->getHeader();
-    BasicBlock *OrigLatch = L->getLoopLatch();
-
-    BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
-    if (!BI || BI->isUnconditional())
+  BasicBlock *OrigHeader = L->getHeader();
+  BasicBlock *OrigLatch = L->getLoopLatch();
+
+  BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+  if (!BI || BI->isUnconditional())
+    return Rotated;
+
+  // If the loop header is not one of the loop exiting blocks then
+  // either this loop is already rotated or it is not
+  // suitable for loop rotation transformations.
+  if (!L->isLoopExiting(OrigHeader))
+    return Rotated;
+
+  // If the loop latch already contains a branch that leaves the loop then the
+  // loop is already rotated.
+  if (!OrigLatch)
+    return Rotated;
+
+  // Rotate if the loop latch was just simplified. Or if it makes the loop exit
+  // count computable. Or if we think it will be profitable.
+  if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
+      !profitableToRotateLoopExitingLatch(L))
+    return Rotated;
+
+  // Check size of original header and reject loop if it is very big or we can't
+  // duplicate blocks inside it.
+  {
+    SmallPtrSet<const Value *, 32> EphValues;
+    CodeMetrics::collectEphemeralValues(L, AC, EphValues);
+
+    CodeMetrics Metrics;
+    Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
+    if (Metrics.notDuplicatable) {
+      LLVM_DEBUG(
+                  dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
+                  << " instructions: ";
+                  L->dump());
       return Rotated;
-
-    // If the loop header is not one of the loop exiting blocks then
-    // either this loop is already rotated or it is not
-    // suitable for loop rotation transformations.
-    if (!L->isLoopExiting(OrigHeader))
-      return Rotated;
-
-    // If the loop latch already contains a branch that leaves the loop then the
-    // loop is already rotated.
-    if (!OrigLatch)
+    }
+    if (Metrics.Convergence != ConvergenceKind::None) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
+                  "instructions: ";
+                  L->dump());
       return Rotated;
-
-    // Rotate if either the loop latch does *not* exit the loop, or if the loop
-    // latch was just simplified. Or if we think it will be profitable.
-    if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
-        !profitableToRotateLoopExitingLatch(L) &&
-        !canRotateDeoptimizingLatchExit(L))
+    }
+    if (!Metrics.NumInsts.isValid()) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
+                  " with invalid cost: ";
+                  L->dump());
       return Rotated;
-
-    // Check size of original header and reject loop if it is very big or we can't
-    // duplicate blocks inside it.
-    {
-      SmallPtrSet<const Value *, 32> EphValues;
-      CodeMetrics::collectEphemeralValues(L, AC, EphValues);
-
-      CodeMetrics Metrics;
-      Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
-      if (Metrics.notDuplicatable) {
-        LLVM_DEBUG(
-                   dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
-                   << " instructions: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (Metrics.Convergence != ConvergenceKind::None) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
-                   "instructions: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (!Metrics.NumInsts.isValid()) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
-                   " with invalid cost: ";
-                   L->dump());
-        return Rotated;
-      }
-      if (Metrics.NumInsts > MaxHeaderSize) {
-        LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
-                          << Metrics.NumInsts
-                          << " instructions, which is more than the threshold ("
-                          << MaxHeaderSize << " instructions): ";
-                   L->dump());
-        ++NumNotRotatedDueToHeaderSize;
-        return Rotated;
-      }
-
-      // When preparing for LTO, avoid rotating loops with calls that could be
-      // inlined during the LTO stage.
-      if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
-        return Rotated;
     }
-
-    // Now, this loop is suitable for rotation.
-    BasicBlock *OrigPreheader = L->getLoopPreheader();
-
-    // If the loop could not be converted to canonical form, it must have an
-    // indirectbr in it, just give up.
-    if (!OrigPreheader || !L->hasDedicatedExits())
+    if (Metrics.NumInsts > MaxHeaderSize) {
+      LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
+                        << Metrics.NumInsts
+                        << " instructions, which is more than the threshold ("
+                        << MaxHeaderSize << " instructions): ";
+                  L->dump());
+      ++NumNotRotatedDueToHeaderSize;
       return Rotated;
-
-    // Anything ScalarEvolution may know about this loop or the PHI nodes
-    // in its header will soon be invalidated. We should also invalidate
-    // all outer loops because insertion and deletion of blocks that happens
-    // during the rotation may violate invariants related to backedge taken
-    // infos in them.
-    if (SE) {
-      SE->forgetTopmostLoop(L);
-      // We may hoist some instructions out of loop. In case if they were cached
-      // as "loop variant" or "loop computable", these caches must be dropped.
-      // We also may fold basic blocks, so cached block dispositions also need
-      // to be dropped.
-      SE->forgetBlockAndLoopDispositions();
     }
 
-    LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
-    if (MSSAU && VerifyMemorySSA)
-      MSSAU->getMemorySSA()->verifyMemorySSA();
-
-    // Find new Loop header. NewHeader is a Header's one and only successor
-    // that is inside loop.  Header's other successor is outside the
-    // loop.  Otherwise loop is not suitable for rotation.
-    BasicBlock *Exit = BI->getSuccessor(0);
-    BasicBlock *NewHeader = BI->getSuccessor(1);
-    bool BISuccsSwapped = L->contains(Exit);
-    if (BISuccsSwapped)
-      std::swap(Exit, NewHeader);
-    assert(NewHeader && "Unable to determine new loop header");
-    assert(L->contains(NewHeader) && !L->contains(Exit) &&
-           "Unable to determine loop header and exit blocks");
-
-    // This code assumes that the new header has exactly one predecessor.
-    // Remove any single-entry PHI nodes in it.
-    assert(NewHeader->getSinglePredecessor() &&
-           "New header doesn't have one pred!");
-    FoldSingleEntryPHINodes(NewHeader);
-
-    // Begin by walking OrigHeader and populating ValueMap with an entry for
-    // each Instruction.
-    BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
-    ValueToValueMapTy ValueMap, ValueMapMSSA;
-
-    // For PHI nodes, the value available in OldPreHeader is just the
-    // incoming value from OldPreHeader.
-    for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
-      InsertNewValueIntoMap(ValueMap, PN,
-                            PN->getIncomingValueForBlock(OrigPreheader));
-
-    // For the rest of the instructions, either hoist to the OrigPreheader if
-    // possible or create a clone in the OldPreHeader if not.
-    Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
-
-    // Record all debug records preceding LoopEntryBranch to avoid
-    // duplication.
-    using DbgHash =
-        std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
-    auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
-      auto VarLocOps = D->location_ops();
-      return {{hash_combine_range(VarLocOps), D->getVariable()},
-              D->getExpression()};
-    };
-
-    SmallDenseSet<DbgHash, 8> DbgRecords;
-    // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
-    // terminator.
-    for (const DbgVariableRecord &DVR :
-         filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
-      DbgRecords.insert(makeHash(&DVR));
-
-    // Remember the local noalias scope declarations in the header. After the
-    // rotation, they must be duplicated and the scope must be cloned. This
-    // avoids unwanted interaction across iterations.
-    SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
-    for (Instruction &I : *OrigHeader)
-      if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
-        NoAliasDeclInstructions.push_back(Decl);
-
-    Module *M = OrigHeader->getModule();
-
-    // Track the next DbgRecord to clone. If we have a sequence where an
-    // instruction is hoisted instead of being cloned:
-    //    DbgRecord blah
-    //    %foo = add i32 0, 0
-    //    DbgRecord xyzzy
-    //    %bar = call i32 @foobar()
-    // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
-    // attached to %foo, then when %foo his hoisted it will "fall down" onto the
-    // function call:
-    //    DbgRecord blah
-    //    DbgRecord xyzzy
-    //    %bar = call i32 @foobar()
-    // causing it to appear attached to the call too.
-    //
-    // To avoid this, cloneDebugInfoFrom takes an optional "start cloning from
-    // here" position to account for this behaviour. We point it at any
-    // DbgRecords on the next instruction, here labelled xyzzy, before we hoist
-    // %foo. Later, we only only clone DbgRecords from that position (xyzzy)
-    // onwards, which avoids cloning DbgRecord "blah" multiple times. (Stored as
-    // a range because it gives us a natural way of testing whether
-    //  there were DbgRecords on the next instruction before we hoisted things).
-    iterator_range<DbgRecord::self_iterator> NextDbgInsts =
-        (I != E) ? I->getDbgRecordRange() : DbgMarker::getEmptyDbgRecordRange();
-
-    while (I != E) {
-      Instruction *Inst = &*I++;
-
-      // If the instruction's operands are invariant and it doesn't read or write
-      // memory, then it is safe to hoist.  Doing this doesn't change the order of
-      // execution in the preheader, but does prevent the instruction from
-      // executing in each iteration of the loop.  This means it is safe to hoist
-      // something that might trap, but isn't safe to hoist something that reads
-      // memory (without proving that the loop doesn't write).
-      if (L->hasLoopInvariantOperands(Inst) && !Inst->mayReadFromMemory() &&
-          !Inst->mayWriteToMemory() && !Inst->isTerminator() &&
-          !isa<AllocaInst>(Inst) &&
-          // It is not safe to hoist the value of these instructions in
-          // coroutines, as the addresses of otherwise eligible variables (e.g.
-          // thread-local variables and errno) may change if the coroutine is
-          // resumed in a different thread.Therefore, we disable this
-          // optimization for correctness. However, this may block other correct
-          // optimizations.
-          // FIXME: This should be reverted once we have a better model for
-          // memory access in coroutines.
-          !Inst->getFunction()->isPresplitCoroutine()) {
-
-        if (!NextDbgInsts.empty()) {
-          auto DbgValueRange =
-              LoopEntryBranch->cloneDebugInfoFrom(Inst, NextDbgInsts.begin());
-          RemapDbgRecordRange(M, DbgValueRange, ValueMap,
-                              RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
-          // Erase anything we've seen before.
-          for (DbgVariableRecord &DVR :
-               make_early_inc_range(filterDbgVars(DbgValueRange)))
-            if (DbgRecords.count(makeHash(&DVR)))
-              DVR.eraseFromParent();
-        }
-
-        NextDbgInsts = I->getDbgRecordRange();
-
-        Inst->moveBefore(LoopEntryBranch->getIterator());
+    // When preparing for LTO, avoid rotating loops with calls that could be
+    // inlined during the LTO stage.
+    if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
+      return Rotated;
+  }
 
-        ++NumInstrsHoisted;
-        continue;
-      }
+  // Now, this loop is suitable for rotation.
+  BasicBlock *OrigPreheader = L->getLoopPreheader();
+
+  // If the loop could not be converted to canonical form, it must have an
+  // indirectbr in it, just give up.
+  if (!OrigPreheader || !L->hasDedicatedExits())
+    return Rotated;
+
+  // Anything ScalarEvolution may know about this loop or the PHI nodes
+  // in its header will soon be invalidated. We should also invalidate
+  // all outer loops because insertion and deletion of blocks that happens
+  // during the rotation may violate invariants related to backedge taken
+  // infos in them.
+  if (SE) {
+    SE->forgetTopmostLoop(L);
+    // We may hoist some instructions out of loop. In case if they were cached
+    // as "loop variant" or "loop computable", these caches must be dropped.
+    // We also may fold basic blocks, so cached block dispositions also need
+    // to be dropped.
+    SE->forgetBlockAndLoopDispositions();
+  }
 
-      // Otherwise, create a duplicate of the instruction.
-      Instruction *C = Inst->clone();
-      if (const DebugLoc &DL = C->getDebugLoc())
-        mapAtomInstance(DL, ValueMap);
+  LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
+  if (MSSAU && VerifyMemorySSA)
+    MSSAU->getMemorySSA()->verifyMemorySSA();
 
-      C->insertBefore(LoopEntryBranch->getIterator());
+  // Find new Loop header. NewHeader is a Header's one and only successor
+  // that is inside loop.  Header's other successor is outside the
+  // loop.  Otherwise loop is not suitable for rotation.
+  BasicBlock *Exit = BI->getSuccessor(0);
+  BasicBlock *NewHeader = BI->getSuccessor(1);
+  bool BISuccsSwapped = L->contains(Exit);
+  if (BISuccsSwapped)
+    std::swap(Exit, NewHeader);
+  assert(NewHeader && "Unable to determine new loop header");
+  assert(L->contains(NewHeader) && !L->contains(Exit) &&
+          "Unable to determine loop header and exit blocks");
+
+  // This code assumes that the new header has exactly one predecessor.
+  // Remove any single-entry PHI nodes in it.
+  assert(NewHeader->getSinglePredecessor() &&
+          "New header doesn't have one pred!");
+  FoldSingleEntryPHINodes(NewHeader);
+
+  // Begin by walking OrigHeader and populating ValueMap with an entry for
+  // each Instruction.
+  BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
+  ValueToValueMapTy ValueMap, ValueMapMSSA;
+
+  // For PHI nodes, the value available in OldPreHeader is just the
+  // incoming value from OldPreHeader.
+  for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
+    InsertNewValueIntoMap(ValueMap, PN,
+                          PN->getIncomingValueForBlock(OrigPreheader));
+
+  // For the rest of the instructions, either hoist to the OrigPreheader if
+  // possible or create a clone in the OldPreHeader if not.
+  Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
+
+  // Record all debug records preceding LoopEntryBranch to avoid
+  // duplication.
+  using DbgHash =
+      std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
+  auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
+    auto VarLocOps = D->location_ops();
+    return {{hash_combine_range(VarLocOps), D->getVariable()},
+            D->getExpression()};
+  };
 
-      ++NumInstrsDuplicated;
+  SmallDenseSet<DbgHash, 8> DbgRecords;
+  // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
+  // terminator.
+  for (const DbgVariableRecord &DVR :
+        filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
+    DbgRecords.insert(makeHash(&DVR));
+
+  // Remember the local noalias scope declarations in the header. After the
+  // rotation, they must be duplicated and the scope must be cloned. This
+  // avoids unwanted interaction across iterations.
+  SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
+  for (Instruction &I : *OrigHeader)
+    if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
+      NoAliasDeclInstructions.push_back(Decl);
+
+  Module *M = OrigHeader->getModule();
+
+  // Track the next DbgRecord to clone. If we have a sequence where an
+  // instruction is hoisted instead of being cloned:
+  //    DbgRecord blah
+  //    %foo = add i32 0, 0
+  //    DbgRecord xyzzy
+  //    %bar = call i32 @foobar()
+  // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
+  // attached to %foo, then when %foo his hoisted it will "fall down" onto the
+  // function call:
+  //    DbgRecord blah
+  //    DbgRecord xyzzy
+  //    %bar = call i32 @foobar()
+  // causing it to appear...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/162482