[Mlir-commits] [mlir] [MLIR] Fix computeTiedSuccessorInputs for single-iteration scf.for (PR #188986)
Mehdi Amini
llvmlistbot at llvm.org
Fri Mar 27 05:54:46 PDT 2026
https://github.com/joker-eph created https://github.com/llvm/llvm-project/pull/188986
For scf.for with a statically known trip count of 1, ForOp::getSuccessorRegions omits the back-edge from the body to itself as an optimisation (enabling InlineRegionBranchOp to inline the single iteration). This caused computeTiedSuccessorInputs to place iter_arg block arguments and the corresponding op results in separate equivalence classes:
- With tripCount > 1, a single scf.yield operand maps to both the body block arg (back-edge) and the result (exit), tying them.
- With tripCount == 1 the yield operand maps only to the result, so the body block arg and result ended up in distinct classes.
RemoveDeadRegionBranchOpSuccessorInputs would then remove a body block arg and its init operand independently from the corresponding result, producing a structurally invalid scf.for (mismatched iter_arg / result counts).
Fix: after building the initial equivalence classes from operandToInputs, look for regions whose terminator exits only to the parent (indicating a single-iteration optimization removed the back-edge), while the parent can still enter the region. When found, and when the region's successor inputs and the parent's successor inputs have matching counts, tie them positionally.
This correctly handles scf.for (back-edge removed by tripCount==1) while leaving scf.while unaffected: scf.while's before-region terminator (scf.condition) always exits to both the after-region and the parent, so the "exits only to parent" check never fires.
Fixes mlir/test/Dialect/SCF/canonicalize.mlir and for-loop-peeling.mlir under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.
Assisted-by: Claude Code
>From 295b48c345871f0a3794b745fc4c91b8c0aa61a3 Mon Sep 17 00:00:00 2001
From: Mehdi Amini <joker.eph at gmail.com>
Date: Thu, 26 Mar 2026 15:57:19 -0700
Subject: [PATCH] [MLIR] Fix computeTiedSuccessorInputs for single-iteration
scf.for
For scf.for with a statically known trip count of 1, ForOp::getSuccessorRegions
omits the back-edge from the body to itself as an optimisation (enabling
InlineRegionBranchOp to inline the single iteration). This caused
computeTiedSuccessorInputs to place iter_arg block arguments and the
corresponding op results in separate equivalence classes:
- With tripCount > 1, a single scf.yield operand maps to both the
body block arg (back-edge) and the result (exit), tying them.
- With tripCount == 1 the yield operand maps only to the result, so the
body block arg and result ended up in distinct classes.
RemoveDeadRegionBranchOpSuccessorInputs would then remove a body block
arg and its init operand independently from the corresponding result,
producing a structurally invalid scf.for (mismatched iter_arg / result
counts).
Fix: after building the initial equivalence classes from operandToInputs,
look for regions whose terminator exits only to the parent (indicating
a single-iteration optimization removed the back-edge), while the parent
can still enter the region. When found, and when the region's successor
inputs and the parent's successor inputs have matching counts, tie them
positionally.
This correctly handles scf.for (back-edge removed by tripCount==1) while
leaving scf.while unaffected: scf.while's before-region terminator
(scf.condition) always exits to both the after-region and the parent, so
the "exits only to parent" check never fires.
Fixes mlir/test/Dialect/SCF/canonicalize.mlir and for-loop-peeling.mlir
under MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.
Assisted-by: Claude Code
Fix a failure present with MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON.
---
mlir/lib/Interfaces/ControlFlowInterfaces.cpp | 76 ++++++++++++++++++-
1 file changed, 73 insertions(+), 3 deletions(-)
diff --git a/mlir/lib/Interfaces/ControlFlowInterfaces.cpp b/mlir/lib/Interfaces/ControlFlowInterfaces.cpp
index c3fb73acf5ef0..eeedbfb6c5c6f 100644
--- a/mlir/lib/Interfaces/ControlFlowInterfaces.cpp
+++ b/mlir/lib/Interfaces/ControlFlowInterfaces.cpp
@@ -783,8 +783,9 @@ static BitVector &lookupOrCreateBitVector(MappingTy &mapping, KeyTy key,
/// ...
/// }
/// There are two sets: {{%r0, %arg0}, {%r1, %arg1}}.
-static llvm::EquivalenceClasses<Value> computeTiedSuccessorInputs(
- const RegionBranchSuccessorMapping &operandToInputs) {
+static llvm::EquivalenceClasses<Value>
+computeTiedSuccessorInputs(const RegionBranchSuccessorMapping &operandToInputs,
+ RegionBranchOpInterface branchOp) {
llvm::EquivalenceClasses<Value> tiedSuccessorInputs;
for (const auto &[operand, inputs] : operandToInputs) {
assert(!inputs.empty() && "expected non-empty inputs");
@@ -796,6 +797,75 @@ static llvm::EquivalenceClasses<Value> computeTiedSuccessorInputs(
tiedSuccessorInputs.unionSets(firstInput, nextInput);
}
}
+
+ // Also tie region entry block args with op results at the same slot index
+ // when the back-edge has been optimized away for a single-iteration loop.
+ //
+ // In the general case (e.g., scf.for with tripCount > 1), a single yield
+ // operand maps to BOTH the body block arg (back-edge) AND the result (exit),
+ // so they end up in the same equivalence class via the loop above. When a
+ // trip-count optimization removes the back-edge (e.g., scf.for with
+ // tripCount == 1), the yield only maps to the result, and the body block arg
+ // is only mapped from the init operand — leaving them in separate classes.
+ // Without the explicit tying below, RemoveDeadRegionBranchOpSuccessorInputs
+ // could remove a body block arg and its init operand independently from the
+ // corresponding result, leaving the op structurally invalid.
+ //
+ // Detection: if a region's terminator exits ONLY to parent (single successor,
+ // indicating a trip-count optimization removed the back-edge), but the parent
+ // CAN enter the region (so the region is reachable), and the region inputs
+ // and parent inputs have matching counts, then they are positionally tied.
+ //
+ // Crucially, this does NOT apply to scf.while's before-region, whose
+ // terminator (scf.condition) always exits to both the after-region AND parent
+ // simultaneously. That case has two successors and is correctly excluded.
+ ValueRange parentInputs =
+ branchOp.getSuccessorInputs(RegionSuccessor::parent());
+ if (!parentInputs.empty()) {
+ // Collect regions that parent can enter.
+ SmallVector<RegionSuccessor> parentSuccessors;
+ branchOp.getSuccessorRegions(RegionBranchPoint::parent(), parentSuccessors);
+
+ for (Region ®ion : branchOp.getOperation()->getRegions()) {
+ // Check that the region is reachable from the parent.
+ bool parentCanEnter =
+ llvm::any_of(parentSuccessors, [&](const RegionSuccessor &s) {
+ return !s.isParent() && s.getSuccessor() == ®ion;
+ });
+ if (!parentCanEnter)
+ continue;
+
+ // Get the entry block's terminator, if it implements
+ // RegionBranchTerminatorOpInterface (otherwise we can't query
+ // successors).
+ if (region.empty() || region.front().empty())
+ continue;
+ auto terminator =
+ dyn_cast<RegionBranchTerminatorOpInterface>(region.front().back());
+ if (!terminator)
+ continue;
+
+ // Check whether the terminator exits ONLY to parent (i.e., the back-edge
+ // has been removed by a single-iteration optimization).
+ SmallVector<RegionSuccessor> terminatorSuccessors;
+ branchOp.getSuccessorRegions(RegionBranchPoint(terminator),
+ terminatorSuccessors);
+ if (terminatorSuccessors.size() != 1 ||
+ !terminatorSuccessors.front().isParent())
+ continue;
+
+ // The terminator exits only to parent, while the region is still
+ // reachable from parent. Tie the region inputs with the parent inputs.
+ ValueRange regionInputs =
+ branchOp.getSuccessorInputs(RegionSuccessor(®ion));
+ if (regionInputs.size() != parentInputs.size())
+ continue;
+
+ for (auto [regionInput, parentInput] :
+ llvm::zip(regionInputs, parentInputs))
+ tiedSuccessorInputs.unionSets(regionInput, parentInput);
+ }
+ }
return tiedSuccessorInputs;
}
@@ -855,7 +925,7 @@ struct RemoveDeadRegionBranchOpSuccessorInputs : public RewritePattern {
RegionBranchSuccessorMapping operandToInputs;
regionBranchOp.getSuccessorOperandInputMapping(operandToInputs);
llvm::EquivalenceClasses<Value> tiedSuccessorInputs =
- computeTiedSuccessorInputs(operandToInputs);
+ computeTiedSuccessorInputs(operandToInputs, regionBranchOp);
// Determine which values to remove and group them by block and operation.
SmallVector<Value> valuesToRemove;
More information about the Mlir-commits
mailing list