[llvm] [LoopRotate] Remove canRotateDeoptimizingLatchExit and multi-rotate (PR #162482)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 8 06:35:17 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-transforms
Author: Marek Sedláček (mark-sed)
<details>
<summary>Changes</summary>
After further investigation of improvements to runtime unrolling and to it connected loop rotation (with my attempts in https://github.com/llvm/llvm-project/pull/146540 and https://github.com/llvm/llvm-project/pull/148243) I have looked more into loop rotation and the existing profitability checks.
The `canRotateDeoptimizingLatchExit` check does not seem to work as intended. It was introduced in https://gitlab.azulsystems.com/dev/orca/-/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3 for the loop-rotate-multi-flag, where this flag was disabled and there was an intent to enable it by default later (https://reviews.llvm.org/D73058) and this has not happen even 5 years later.
I have done multiple experiments in our downstream with multi-rotate and with this check. We suggest removal of this heuristic and multi-rotate as well.
Note that the diff is big, but it's just removal of while loop and indentation change.
After this patch I would like to continue here and propose adding a computability check for exit count, but that will be in a separate PR.
Requests for review: @<!-- -->annamthomas @<!-- -->fhahn @<!-- -->davemgreen
---
Patch is 61.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162482.diff
3 Files Affected:
- (modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+437-504)
- (removed) llvm/test/Transforms/LoopRotate/multiple-deopt-exits.ll (-164)
- (removed) llvm/test/Transforms/LoopRotate/multiple-exits.ll (-236)
``````````diff
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 7cc9ff8b11139..f0f7bbb64da32 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -45,12 +45,6 @@ STATISTIC(NumInstrsHoisted,
"Number of instructions hoisted into loop preheader");
STATISTIC(NumInstrsDuplicated,
"Number of instructions cloned into loop preheader");
-STATISTIC(NumRotated, "Number of loops rotated");
-
-static cl::opt<bool>
- MultiRotate("loop-rotate-multi", cl::init(false), cl::Hidden,
- cl::desc("Allow loop rotation multiple times in order to reach "
- "a better latch exit"));
// Probability that a rotated loop has zero trip count / is never entered.
static constexpr uint32_t ZeroTripCountWeights[] = {1, 127};
@@ -206,50 +200,6 @@ static bool profitableToRotateLoopExitingLatch(Loop *L) {
return false;
}
-// Check that latch exit is deoptimizing (which means - very unlikely to happen)
-// and there is another exit from the loop which is non-deoptimizing.
-// If we rotate latch to that exit our loop has a better chance of being fully
-// canonical.
-//
-// It can give false positives in some rare cases.
-static bool canRotateDeoptimizingLatchExit(Loop *L) {
- BasicBlock *Latch = L->getLoopLatch();
- assert(Latch && "need latch");
- BranchInst *BI = dyn_cast<BranchInst>(Latch->getTerminator());
- // Need normal exiting latch.
- if (!BI || !BI->isConditional())
- return false;
-
- BasicBlock *Exit = BI->getSuccessor(1);
- if (L->contains(Exit))
- Exit = BI->getSuccessor(0);
-
- // Latch exit is non-deoptimizing, no need to rotate.
- if (!Exit->getPostdominatingDeoptimizeCall())
- return false;
-
- SmallVector<BasicBlock *, 4> Exits;
- L->getUniqueExitBlocks(Exits);
- if (!Exits.empty()) {
- // There is at least one non-deoptimizing exit.
- //
- // Note, that BasicBlock::getPostdominatingDeoptimizeCall is not exact,
- // as it can conservatively return false for deoptimizing exits with
- // complex enough control flow down to deoptimize call.
- //
- // That means here we can report success for a case where
- // all exits are deoptimizing but one of them has complex enough
- // control flow (e.g. with loops).
- //
- // That should be a very rare case and false positives for this function
- // have compile-time effect only.
- return any_of(Exits, [](const BasicBlock *BB) {
- return !BB->getPostdominatingDeoptimizeCall();
- });
- }
- return false;
-}
-
static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
bool HasConditionalPreHeader,
bool SuccsSwapped) {
@@ -387,506 +337,489 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
/// rotation. LoopRotate should be repeatable and converge to a canonical
/// form. This property is satisfied because simplifying the loop latch can only
/// happen once across multiple invocations of the LoopRotate pass.
-///
-/// If -loop-rotate-multi is enabled we can do multiple rotations in one go
-/// so to reach a suitable (non-deoptimizing) exit.
bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
// If the loop has only one block then there is not much to rotate.
if (L->getBlocks().size() == 1)
return false;
bool Rotated = false;
- do {
- BasicBlock *OrigHeader = L->getHeader();
- BasicBlock *OrigLatch = L->getLoopLatch();
-
- BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
- if (!BI || BI->isUnconditional())
+ BasicBlock *OrigHeader = L->getHeader();
+ BasicBlock *OrigLatch = L->getLoopLatch();
+
+ BranchInst *BI = dyn_cast<BranchInst>(OrigHeader->getTerminator());
+ if (!BI || BI->isUnconditional())
+ return Rotated;
+
+ // If the loop header is not one of the loop exiting blocks then
+ // either this loop is already rotated or it is not
+ // suitable for loop rotation transformations.
+ if (!L->isLoopExiting(OrigHeader))
+ return Rotated;
+
+ // If the loop latch already contains a branch that leaves the loop then the
+ // loop is already rotated.
+ if (!OrigLatch)
+ return Rotated;
+
+ // Rotate if the loop latch was just simplified. Or if it makes the loop exit
+ // count computable. Or if we think it will be profitable.
+ if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
+ !profitableToRotateLoopExitingLatch(L))
+ return Rotated;
+
+ // Check size of original header and reject loop if it is very big or we can't
+ // duplicate blocks inside it.
+ {
+ SmallPtrSet<const Value *, 32> EphValues;
+ CodeMetrics::collectEphemeralValues(L, AC, EphValues);
+
+ CodeMetrics Metrics;
+ Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
+ if (Metrics.notDuplicatable) {
+ LLVM_DEBUG(
+ dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
+ << " instructions: ";
+ L->dump());
return Rotated;
-
- // If the loop header is not one of the loop exiting blocks then
- // either this loop is already rotated or it is not
- // suitable for loop rotation transformations.
- if (!L->isLoopExiting(OrigHeader))
- return Rotated;
-
- // If the loop latch already contains a branch that leaves the loop then the
- // loop is already rotated.
- if (!OrigLatch)
+ }
+ if (Metrics.Convergence != ConvergenceKind::None) {
+ LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
+ "instructions: ";
+ L->dump());
return Rotated;
-
- // Rotate if either the loop latch does *not* exit the loop, or if the loop
- // latch was just simplified. Or if we think it will be profitable.
- if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
- !profitableToRotateLoopExitingLatch(L) &&
- !canRotateDeoptimizingLatchExit(L))
+ }
+ if (!Metrics.NumInsts.isValid()) {
+ LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
+ " with invalid cost: ";
+ L->dump());
return Rotated;
-
- // Check size of original header and reject loop if it is very big or we can't
- // duplicate blocks inside it.
- {
- SmallPtrSet<const Value *, 32> EphValues;
- CodeMetrics::collectEphemeralValues(L, AC, EphValues);
-
- CodeMetrics Metrics;
- Metrics.analyzeBasicBlock(OrigHeader, *TTI, EphValues, PrepareForLTO);
- if (Metrics.notDuplicatable) {
- LLVM_DEBUG(
- dbgs() << "LoopRotation: NOT rotating - contains non-duplicatable"
- << " instructions: ";
- L->dump());
- return Rotated;
- }
- if (Metrics.Convergence != ConvergenceKind::None) {
- LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains convergent "
- "instructions: ";
- L->dump());
- return Rotated;
- }
- if (!Metrics.NumInsts.isValid()) {
- LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains instructions"
- " with invalid cost: ";
- L->dump());
- return Rotated;
- }
- if (Metrics.NumInsts > MaxHeaderSize) {
- LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
- << Metrics.NumInsts
- << " instructions, which is more than the threshold ("
- << MaxHeaderSize << " instructions): ";
- L->dump());
- ++NumNotRotatedDueToHeaderSize;
- return Rotated;
- }
-
- // When preparing for LTO, avoid rotating loops with calls that could be
- // inlined during the LTO stage.
- if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
- return Rotated;
}
-
- // Now, this loop is suitable for rotation.
- BasicBlock *OrigPreheader = L->getLoopPreheader();
-
- // If the loop could not be converted to canonical form, it must have an
- // indirectbr in it, just give up.
- if (!OrigPreheader || !L->hasDedicatedExits())
+ if (Metrics.NumInsts > MaxHeaderSize) {
+ LLVM_DEBUG(dbgs() << "LoopRotation: NOT rotating - contains "
+ << Metrics.NumInsts
+ << " instructions, which is more than the threshold ("
+ << MaxHeaderSize << " instructions): ";
+ L->dump());
+ ++NumNotRotatedDueToHeaderSize;
return Rotated;
-
- // Anything ScalarEvolution may know about this loop or the PHI nodes
- // in its header will soon be invalidated. We should also invalidate
- // all outer loops because insertion and deletion of blocks that happens
- // during the rotation may violate invariants related to backedge taken
- // infos in them.
- if (SE) {
- SE->forgetTopmostLoop(L);
- // We may hoist some instructions out of loop. In case if they were cached
- // as "loop variant" or "loop computable", these caches must be dropped.
- // We also may fold basic blocks, so cached block dispositions also need
- // to be dropped.
- SE->forgetBlockAndLoopDispositions();
}
- LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
- if (MSSAU && VerifyMemorySSA)
- MSSAU->getMemorySSA()->verifyMemorySSA();
-
- // Find new Loop header. NewHeader is a Header's one and only successor
- // that is inside loop. Header's other successor is outside the
- // loop. Otherwise loop is not suitable for rotation.
- BasicBlock *Exit = BI->getSuccessor(0);
- BasicBlock *NewHeader = BI->getSuccessor(1);
- bool BISuccsSwapped = L->contains(Exit);
- if (BISuccsSwapped)
- std::swap(Exit, NewHeader);
- assert(NewHeader && "Unable to determine new loop header");
- assert(L->contains(NewHeader) && !L->contains(Exit) &&
- "Unable to determine loop header and exit blocks");
-
- // This code assumes that the new header has exactly one predecessor.
- // Remove any single-entry PHI nodes in it.
- assert(NewHeader->getSinglePredecessor() &&
- "New header doesn't have one pred!");
- FoldSingleEntryPHINodes(NewHeader);
-
- // Begin by walking OrigHeader and populating ValueMap with an entry for
- // each Instruction.
- BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
- ValueToValueMapTy ValueMap, ValueMapMSSA;
-
- // For PHI nodes, the value available in OldPreHeader is just the
- // incoming value from OldPreHeader.
- for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
- InsertNewValueIntoMap(ValueMap, PN,
- PN->getIncomingValueForBlock(OrigPreheader));
-
- // For the rest of the instructions, either hoist to the OrigPreheader if
- // possible or create a clone in the OldPreHeader if not.
- Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
-
- // Record all debug records preceding LoopEntryBranch to avoid
- // duplication.
- using DbgHash =
- std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
- auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
- auto VarLocOps = D->location_ops();
- return {{hash_combine_range(VarLocOps), D->getVariable()},
- D->getExpression()};
- };
-
- SmallDenseSet<DbgHash, 8> DbgRecords;
- // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
- // terminator.
- for (const DbgVariableRecord &DVR :
- filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
- DbgRecords.insert(makeHash(&DVR));
-
- // Remember the local noalias scope declarations in the header. After the
- // rotation, they must be duplicated and the scope must be cloned. This
- // avoids unwanted interaction across iterations.
- SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
- for (Instruction &I : *OrigHeader)
- if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
- NoAliasDeclInstructions.push_back(Decl);
-
- Module *M = OrigHeader->getModule();
-
- // Track the next DbgRecord to clone. If we have a sequence where an
- // instruction is hoisted instead of being cloned:
- // DbgRecord blah
- // %foo = add i32 0, 0
- // DbgRecord xyzzy
- // %bar = call i32 @foobar()
- // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
- // attached to %foo, then when %foo his hoisted it will "fall down" onto the
- // function call:
- // DbgRecord blah
- // DbgRecord xyzzy
- // %bar = call i32 @foobar()
- // causing it to appear attached to the call too.
- //
- // To avoid this, cloneDebugInfoFrom takes an optional "start cloning from
- // here" position to account for this behaviour. We point it at any
- // DbgRecords on the next instruction, here labelled xyzzy, before we hoist
- // %foo. Later, we only only clone DbgRecords from that position (xyzzy)
- // onwards, which avoids cloning DbgRecord "blah" multiple times. (Stored as
- // a range because it gives us a natural way of testing whether
- // there were DbgRecords on the next instruction before we hoisted things).
- iterator_range<DbgRecord::self_iterator> NextDbgInsts =
- (I != E) ? I->getDbgRecordRange() : DbgMarker::getEmptyDbgRecordRange();
-
- while (I != E) {
- Instruction *Inst = &*I++;
-
- // If the instruction's operands are invariant and it doesn't read or write
- // memory, then it is safe to hoist. Doing this doesn't change the order of
- // execution in the preheader, but does prevent the instruction from
- // executing in each iteration of the loop. This means it is safe to hoist
- // something that might trap, but isn't safe to hoist something that reads
- // memory (without proving that the loop doesn't write).
- if (L->hasLoopInvariantOperands(Inst) && !Inst->mayReadFromMemory() &&
- !Inst->mayWriteToMemory() && !Inst->isTerminator() &&
- !isa<AllocaInst>(Inst) &&
- // It is not safe to hoist the value of these instructions in
- // coroutines, as the addresses of otherwise eligible variables (e.g.
- // thread-local variables and errno) may change if the coroutine is
- // resumed in a different thread.Therefore, we disable this
- // optimization for correctness. However, this may block other correct
- // optimizations.
- // FIXME: This should be reverted once we have a better model for
- // memory access in coroutines.
- !Inst->getFunction()->isPresplitCoroutine()) {
-
- if (!NextDbgInsts.empty()) {
- auto DbgValueRange =
- LoopEntryBranch->cloneDebugInfoFrom(Inst, NextDbgInsts.begin());
- RemapDbgRecordRange(M, DbgValueRange, ValueMap,
- RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
- // Erase anything we've seen before.
- for (DbgVariableRecord &DVR :
- make_early_inc_range(filterDbgVars(DbgValueRange)))
- if (DbgRecords.count(makeHash(&DVR)))
- DVR.eraseFromParent();
- }
-
- NextDbgInsts = I->getDbgRecordRange();
-
- Inst->moveBefore(LoopEntryBranch->getIterator());
+ // When preparing for LTO, avoid rotating loops with calls that could be
+ // inlined during the LTO stage.
+ if (PrepareForLTO && Metrics.NumInlineCandidates > 0)
+ return Rotated;
+ }
- ++NumInstrsHoisted;
- continue;
- }
+ // Now, this loop is suitable for rotation.
+ BasicBlock *OrigPreheader = L->getLoopPreheader();
+
+ // If the loop could not be converted to canonical form, it must have an
+ // indirectbr in it, just give up.
+ if (!OrigPreheader || !L->hasDedicatedExits())
+ return Rotated;
+
+ // Anything ScalarEvolution may know about this loop or the PHI nodes
+ // in its header will soon be invalidated. We should also invalidate
+ // all outer loops because insertion and deletion of blocks that happens
+ // during the rotation may violate invariants related to backedge taken
+ // infos in them.
+ if (SE) {
+ SE->forgetTopmostLoop(L);
+ // We may hoist some instructions out of loop. In case if they were cached
+ // as "loop variant" or "loop computable", these caches must be dropped.
+ // We also may fold basic blocks, so cached block dispositions also need
+ // to be dropped.
+ SE->forgetBlockAndLoopDispositions();
+ }
- // Otherwise, create a duplicate of the instruction.
- Instruction *C = Inst->clone();
- if (const DebugLoc &DL = C->getDebugLoc())
- mapAtomInstance(DL, ValueMap);
+ LLVM_DEBUG(dbgs() << "LoopRotation: rotating "; L->dump());
+ if (MSSAU && VerifyMemorySSA)
+ MSSAU->getMemorySSA()->verifyMemorySSA();
- C->insertBefore(LoopEntryBranch->getIterator());
+ // Find new Loop header. NewHeader is a Header's one and only successor
+ // that is inside loop. Header's other successor is outside the
+ // loop. Otherwise loop is not suitable for rotation.
+ BasicBlock *Exit = BI->getSuccessor(0);
+ BasicBlock *NewHeader = BI->getSuccessor(1);
+ bool BISuccsSwapped = L->contains(Exit);
+ if (BISuccsSwapped)
+ std::swap(Exit, NewHeader);
+ assert(NewHeader && "Unable to determine new loop header");
+ assert(L->contains(NewHeader) && !L->contains(Exit) &&
+ "Unable to determine loop header and exit blocks");
+
+ // This code assumes that the new header has exactly one predecessor.
+ // Remove any single-entry PHI nodes in it.
+ assert(NewHeader->getSinglePredecessor() &&
+ "New header doesn't have one pred!");
+ FoldSingleEntryPHINodes(NewHeader);
+
+ // Begin by walking OrigHeader and populating ValueMap with an entry for
+ // each Instruction.
+ BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();
+ ValueToValueMapTy ValueMap, ValueMapMSSA;
+
+ // For PHI nodes, the value available in OldPreHeader is just the
+ // incoming value from OldPreHeader.
+ for (; PHINode *PN = dyn_cast<PHINode>(I); ++I)
+ InsertNewValueIntoMap(ValueMap, PN,
+ PN->getIncomingValueForBlock(OrigPreheader));
+
+ // For the rest of the instructions, either hoist to the OrigPreheader if
+ // possible or create a clone in the OldPreHeader if not.
+ Instruction *LoopEntryBranch = OrigPreheader->getTerminator();
+
+ // Record all debug records preceding LoopEntryBranch to avoid
+ // duplication.
+ using DbgHash =
+ std::pair<std::pair<hash_code, DILocalVariable *>, DIExpression *>;
+ auto makeHash = [](const DbgVariableRecord *D) -> DbgHash {
+ auto VarLocOps = D->location_ops();
+ return {{hash_combine_range(VarLocOps), D->getVariable()},
+ D->getExpression()};
+ };
- ++NumInstrsDuplicated;
+ SmallDenseSet<DbgHash, 8> DbgRecords;
+ // Build DbgVariableRecord hashes for DbgVariableRecords attached to the
+ // terminator.
+ for (const DbgVariableRecord &DVR :
+ filterDbgVars(OrigPreheader->getTerminator()->getDbgRecordRange()))
+ DbgRecords.insert(makeHash(&DVR));
+
+ // Remember the local noalias scope declarations in the header. After the
+ // rotation, they must be duplicated and the scope must be cloned. This
+ // avoids unwanted interaction across iterations.
+ SmallVector<NoAliasScopeDeclInst *, 6> NoAliasDeclInstructions;
+ for (Instruction &I : *OrigHeader)
+ if (auto *Decl = dyn_cast<NoAliasScopeDeclInst>(&I))
+ NoAliasDeclInstructions.push_back(Decl);
+
+ Module *M = OrigHeader->getModule();
+
+ // Track the next DbgRecord to clone. If we have a sequence where an
+ // instruction is hoisted instead of being cloned:
+ // DbgRecord blah
+ // %foo = add i32 0, 0
+ // DbgRecord xyzzy
+ // %bar = call i32 @foobar()
+ // where %foo is hoisted, then the DbgRecord "blah" will be seen twice, once
+ // attached to %foo, then when %foo his hoisted it will "fall down" onto the
+ // function call:
+ // DbgRecord blah
+ // DbgRecord xyzzy
+ // %bar = call i32 @foobar()
+ // causing it to appear...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/162482
More information about the llvm-commits
mailing list