[Mlir-commits] [mlir] [mlir][affine] Add visited set to hasDependencePath (PR #199011)
Ryan Kim
llvmlistbot at llvm.org
Fri May 22 06:15:34 PDT 2026
https://github.com/chokobole updated https://github.com/llvm/llvm-project/pull/199011
>From 4f7613cbda4af98c4c6a68442fc44f1036e1c31d Mon Sep 17 00:00:00 2001
From: Ryan Kim <chokobole33 at gmail.com>
Date: Thu, 21 May 2026 18:55:03 +0900
Subject: [PATCH] [mlir][affine] Add visited set to hasDependencePath
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
I'm building a compiler-based zkVM. When it lowers a fully-unrolled
Poseidon2 permutation — a hash function widely used in zero-knowledge
proof systems — to MLIR, the output is ~4 kLOC of IR with many
independent loads and stores on a single state buffer between
consecutive `affine.for` loops (one per permutation round). Running
`affine-loop-fusion` on that IR hangs at 99% CPU forever.
The hang is inside `MemRefDependenceGraph::hasDependencePath` in
`mlir/lib/Dialect/Affine/Analysis/Utils.cpp`. The function performs a
DFS with an explicit worklist but no visited set, so every diamond
merge in the dependence graph re-enumerates each path through it —
worst case `O(2^V)` in node count. Gate each per-edge push on
`visited.insert(edge.id).second`, backed by a new `DenseSet<unsigned>`
recording every node already pushed onto the worklist. Worst case
becomes `O(V + E)`.
The query's answer is unchanged. Every reachable node is still visited
at least once — the visited set only suppresses re-pushes of nodes
already on the worklist — so the search still returns `true` exactly
when at least one path from `srcId` to `dstId` exists. The MDG is a
DAG (edges respect program order), so a node visited via one path
cannot later be re-reached via a different valid path that would
change the answer.
The regression is covered by
`mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp`, a new
C++ unit test with three cases. The key one,
`DiamondMergesNoExponentialBlowup`, builds a 30-layer diamond chain
(`2^30` distinct src-to-merge paths) and queries a deliberately
unreachable destination — the worst case for the DFS, which must walk
every path before concluding `false`. Without the visited set the call
does not return within an observed 30 s timeout; with it, it returns
in under a millisecond. A C++ unit test rather than `lit + FileCheck`
because the smallest deterministic MLIR reproducer I could find was
the actual ~4 kLOC fully-unrolled Poseidon2 IR — smaller synthetic
`.mlir` inputs did not trigger the blowup, since beyond memref-dep
edges the MDG only adds non-memref SSA edges (via
`gatherDefiningNodes`) and those do not form the diamond shape from
small inputs.
---
mlir/lib/Dialect/Affine/Analysis/Utils.cpp | 11 +-
mlir/unittests/Dialect/Affine/CMakeLists.txt | 11 ++
.../Affine/MemRefDependenceGraphTest.cpp | 155 ++++++++++++++++++
mlir/unittests/Dialect/CMakeLists.txt | 1 +
4 files changed, 177 insertions(+), 1 deletion(-)
create mode 100644 mlir/unittests/Dialect/Affine/CMakeLists.txt
create mode 100644 mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
diff --git a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
index ebe932a14694a..b747044e9c288 100644
--- a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
@@ -574,6 +574,14 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
SmallVector<std::pair<unsigned, unsigned>, 4> worklist;
worklist.push_back({srcId, 0});
Operation *dstOp = getNode(dstId)->op;
+ // Track nodes already pushed onto the worklist. The MDG is a DAG (edges
+ // respect program order), but without a visited set this DFS enumerates
+ // every path from 'srcId' to every reachable node — exponential in the
+ // number of diamond merges. With many independent memref accesses on the
+ // same buffer (e.g. fully-unrolled SIMD-like bodies that load and store
+ // from one buffer between several affine.for loops), this is fatal.
+ DenseSet<unsigned> visited;
+ visited.insert(srcId);
// Run DFS traversal to see if 'dstId' is reachable from 'srcId'.
while (!worklist.empty()) {
auto &idAndIndex = worklist.back();
@@ -595,7 +603,8 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
// nodes that are "after" dstId in the containing block; one can't have a
// path to `dstId` from any of those nodes.
bool afterDst = dstOp->isBeforeInBlock(getNode(edge.id)->op);
- if (!afterDst && edge.id != idAndIndex.first)
+ if (!afterDst && edge.id != idAndIndex.first &&
+ visited.insert(edge.id).second)
worklist.push_back({edge.id, 0});
}
return false;
diff --git a/mlir/unittests/Dialect/Affine/CMakeLists.txt b/mlir/unittests/Dialect/Affine/CMakeLists.txt
new file mode 100644
index 0000000000000..ccf5b25170ea1
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_mlir_unittest(MLIRAffineTests
+ MemRefDependenceGraphTest.cpp
+)
+mlir_target_link_libraries(MLIRAffineTests
+ PRIVATE
+ MLIRIR
+ MLIRAffineAnalysis
+ MLIRAffineDialect
+ MLIRArithDialect
+ MLIRFuncDialect
+)
diff --git a/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
new file mode 100644
index 0000000000000..69a5189959b82
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
@@ -0,0 +1,155 @@
+//===- MemRefDependenceGraphTest.cpp - MDG hasDependencePath tests --------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/Affine/Analysis/Utils.h"
+#include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/IR/Builders.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/MLIRContext.h"
+#include "mlir/IR/OwningOpRef.h"
+
+#include "gtest/gtest.h"
+
+using namespace mlir;
+using namespace mlir::affine;
+
+namespace {
+
+/// Test fixture that builds a tiny function with `numOps` top-level
+/// `arith.constant` ops in a single block. We use this scaffolding to obtain
+/// real `Operation *` pointers in a real `Block` so that the MDG's DFS, which
+/// uses `Operation::isBeforeInBlock`, has well-defined ordering.
+class MemRefDependenceGraphTest : public ::testing::Test {
+protected:
+ MemRefDependenceGraphTest() : builder(&context) {
+ context
+ .loadDialect<AffineDialect, arith::ArithDialect, func::FuncDialect>();
+ }
+
+ /// Builds a function with `numOps` `arith.constant` ops at the top of its
+ /// entry block and returns a vector of pointers to them in program order.
+ /// The returned ops outlive the test by virtue of being owned by `module`.
+ SmallVector<Operation *> buildLinearOps(unsigned numOps) {
+ Location loc = UnknownLoc::get(&context);
+ module = ModuleOp::create(loc);
+ builder.setInsertionPointToStart(module->getBody());
+ auto funcOp = func::FuncOp::create(builder, loc, "test",
+ builder.getFunctionType({}, {}));
+ Block *entry = funcOp.addEntryBlock();
+ builder.setInsertionPointToStart(entry);
+ SmallVector<Operation *> ops;
+ ops.reserve(numOps);
+ for (unsigned i = 0; i < numOps; ++i)
+ ops.push_back(arith::ConstantOp::create(builder, loc,
+ builder.getI32IntegerAttr(0)));
+ func::ReturnOp::create(builder, loc);
+ block = entry;
+ return ops;
+ }
+
+ MLIRContext context;
+ OpBuilder builder;
+ OwningOpRef<ModuleOp> module;
+ Block *block = nullptr;
+};
+
+/// Smoke test: `hasDependencePath` returns true for a direct edge and false
+/// when the destination is not reachable from the source.
+TEST_F(MemRefDependenceGraphTest, DirectAndUnreachable) {
+ SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/3);
+ MemRefDependenceGraph mdg(*block);
+ unsigned a = mdg.addNode(ops[0]);
+ unsigned b = mdg.addNode(ops[1]);
+ unsigned c = mdg.addNode(ops[2]);
+ Value v = ops[0]->getResult(0);
+ mdg.addEdge(a, b, v);
+
+ EXPECT_TRUE(mdg.hasDependencePath(a, a));
+ EXPECT_TRUE(mdg.hasDependencePath(a, b));
+ EXPECT_FALSE(mdg.hasDependencePath(a, c));
+ EXPECT_FALSE(mdg.hasDependencePath(b, a));
+}
+
+/// `hasDependencePath` correctly traverses a chain of edges.
+TEST_F(MemRefDependenceGraphTest, TransitiveChain) {
+ SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/4);
+ MemRefDependenceGraph mdg(*block);
+ SmallVector<unsigned, 4> ids;
+ for (Operation *op : ops)
+ ids.push_back(mdg.addNode(op));
+ Value v = ops[0]->getResult(0);
+ for (unsigned i = 0; i + 1 < ids.size(); ++i)
+ mdg.addEdge(ids[i], ids[i + 1], v);
+
+ EXPECT_TRUE(mdg.hasDependencePath(ids[0], ids[3]));
+ EXPECT_FALSE(mdg.hasDependencePath(ids[3], ids[0]));
+}
+
+/// Regression test for an exponential-time DFS in `hasDependencePath`. The
+/// MDG is built as a chain of `kLayers` diamond merges:
+///
+/// layer 0 layer 1 layer kLayers-1
+/// src → A0,B0 → mid0 → A1,B1 → mid1 → … → midN-1 unreachable
+///
+/// The chain has `2^kLayers` distinct paths leading from `src` to the final
+/// merge, but the query node `unreachable` has no incoming edge from the
+/// chain. Before the visited-set fix, the DFS pushed each chain node onto
+/// the worklist once per path that reached it — i.e. exponentially in
+/// `kLayers` — and only returned `false` once every path had been walked.
+/// `affine-loop-fusion` hung at 99% CPU indefinitely on real workloads with
+/// this shape. With the fix, the DFS visits each node at most once, so the
+/// query is bounded by O(V + E).
+///
+/// At `kLayers = 30` the unfixed code enumerates ~10^9 paths and runs for
+/// far longer than any reasonable test timeout; the patched code completes
+/// the query in well under a millisecond.
+TEST_F(MemRefDependenceGraphTest, DiamondMergesNoExponentialBlowup) {
+ constexpr unsigned kLayers = 30;
+ // Per layer: two parallel nodes (A, B) plus one merge node. Plus one node
+ // each for src and a separate, deliberately unreachable destination
+ // placed last in program order.
+ constexpr unsigned kNumOps = 2 + 3 * kLayers;
+ SmallVector<Operation *> ops = buildLinearOps(kNumOps);
+
+ MemRefDependenceGraph mdg(*block);
+ SmallVector<unsigned, kNumOps> ids;
+ for (Operation *op : ops)
+ ids.push_back(mdg.addNode(op));
+ Value v = ops[0]->getResult(0);
+
+ // Wire up the diamond chain. `prev` is the merge node of the previous
+ // layer (or src for the first layer); each layer routes prev → {A, B} →
+ // mid, and mid becomes the next layer's prev. The chain's final `mid`
+ // has no edge to the unreachable node.
+ unsigned src = ids.front();
+ unsigned unreachable = ids.back();
+ unsigned prev = src;
+ for (unsigned layer = 0; layer < kLayers; ++layer) {
+ unsigned a = ids[1 + 3 * layer];
+ unsigned b = ids[2 + 3 * layer];
+ unsigned mid = ids[3 + 3 * layer];
+ mdg.addEdge(prev, a, v);
+ mdg.addEdge(prev, b, v);
+ mdg.addEdge(a, mid, v);
+ mdg.addEdge(b, mid, v);
+ prev = mid;
+ }
+ // Sanity: forward-reachability inside the chain works as expected.
+ EXPECT_TRUE(mdg.hasDependencePath(src, prev));
+
+ // The exponential-blowup query: dst is not in the chain's reachable set,
+ // so the DFS must walk the whole chain before concluding `false`. The
+ // assertion is that this call terminates AND returns the correct result;
+ // without the visited-set fix it does not return within any reasonable
+ // test timeout.
+ EXPECT_FALSE(mdg.hasDependencePath(src, unreachable));
+}
+
+} // namespace
diff --git a/mlir/unittests/Dialect/CMakeLists.txt b/mlir/unittests/Dialect/CMakeLists.txt
index 269eccb1f93c3..deeb58cd0b110 100644
--- a/mlir/unittests/Dialect/CMakeLists.txt
+++ b/mlir/unittests/Dialect/CMakeLists.txt
@@ -7,6 +7,7 @@ mlir_target_link_libraries(MLIRDialectTests
MLIRDialect)
add_subdirectory(AMDGPU)
+add_subdirectory(Affine)
add_subdirectory(ArmSME)
add_subdirectory(Index)
add_subdirectory(Linalg)
More information about the Mlir-commits
mailing list