[Mlir-commits] [mlir] [mlir][affine] Add visited set to hasDependencePath (PR #199011)

Ryan Kim llvmlistbot at llvm.org
Fri May 22 06:15:34 PDT 2026


https://github.com/chokobole updated https://github.com/llvm/llvm-project/pull/199011

>From 4f7613cbda4af98c4c6a68442fc44f1036e1c31d Mon Sep 17 00:00:00 2001
From: Ryan Kim <chokobole33 at gmail.com>
Date: Thu, 21 May 2026 18:55:03 +0900
Subject: [PATCH] [mlir][affine] Add visited set to hasDependencePath
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I'm building a compiler-based zkVM. When it lowers a fully-unrolled
Poseidon2 permutation — a hash function widely used in zero-knowledge
proof systems — to MLIR, the output is ~4 kLOC of IR with many
independent loads and stores on a single state buffer between
consecutive `affine.for` loops (one per permutation round). Running
`affine-loop-fusion` on that IR hangs at 99% CPU forever.

The hang is inside `MemRefDependenceGraph::hasDependencePath` in
`mlir/lib/Dialect/Affine/Analysis/Utils.cpp`. The function performs a
DFS with an explicit worklist but no visited set, so every diamond
merge in the dependence graph re-enumerates each path through it —
worst case `O(2^V)` in node count. Gate each per-edge push on
`visited.insert(edge.id).second`, backed by a new `DenseSet<unsigned>`
recording every node already pushed onto the worklist. Worst case
becomes `O(V + E)`.

The query's answer is unchanged. Every reachable node is still visited
at least once — the visited set only suppresses re-pushes of nodes
already on the worklist — so the search still returns `true` exactly
when at least one path from `srcId` to `dstId` exists. The MDG is a
DAG (edges respect program order), so a node visited via one path
cannot later be re-reached via a different valid path that would
change the answer.

The regression is covered by
`mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp`, a new
C++ unit test with three cases. The key one,
`DiamondMergesNoExponentialBlowup`, builds a 30-layer diamond chain
(`2^30` distinct src-to-merge paths) and queries a deliberately
unreachable destination — the worst case for the DFS, which must walk
every path before concluding `false`. Without the visited set the call
does not return within an observed 30 s timeout; with it, it returns
in under a millisecond. A C++ unit test rather than `lit + FileCheck`
because the smallest deterministic MLIR reproducer I could find was
the actual ~4 kLOC fully-unrolled Poseidon2 IR — smaller synthetic
`.mlir` inputs did not trigger the blowup, since beyond memref-dep
edges the MDG only adds non-memref SSA edges (via
`gatherDefiningNodes`) and those do not form the diamond shape from
small inputs.
---
 mlir/lib/Dialect/Affine/Analysis/Utils.cpp    |  11 +-
 mlir/unittests/Dialect/Affine/CMakeLists.txt  |  11 ++
 .../Affine/MemRefDependenceGraphTest.cpp      | 155 ++++++++++++++++++
 mlir/unittests/Dialect/CMakeLists.txt         |   1 +
 4 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 mlir/unittests/Dialect/Affine/CMakeLists.txt
 create mode 100644 mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp

diff --git a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
index ebe932a14694a..b747044e9c288 100644
--- a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
@@ -574,6 +574,14 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
   SmallVector<std::pair<unsigned, unsigned>, 4> worklist;
   worklist.push_back({srcId, 0});
   Operation *dstOp = getNode(dstId)->op;
+  // Track nodes already pushed onto the worklist. The MDG is a DAG (edges
+  // respect program order), but without a visited set this DFS enumerates
+  // every path from 'srcId' to every reachable node — exponential in the
+  // number of diamond merges. With many independent memref accesses on the
+  // same buffer (e.g. fully-unrolled SIMD-like bodies that load and store
+  // from one buffer between several affine.for loops), this is fatal.
+  DenseSet<unsigned> visited;
+  visited.insert(srcId);
   // Run DFS traversal to see if 'dstId' is reachable from 'srcId'.
   while (!worklist.empty()) {
     auto &idAndIndex = worklist.back();
@@ -595,7 +603,8 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
     // nodes that are "after" dstId in the containing block; one can't have a
     // path to `dstId` from any of those nodes.
     bool afterDst = dstOp->isBeforeInBlock(getNode(edge.id)->op);
-    if (!afterDst && edge.id != idAndIndex.first)
+    if (!afterDst && edge.id != idAndIndex.first &&
+        visited.insert(edge.id).second)
       worklist.push_back({edge.id, 0});
   }
   return false;
diff --git a/mlir/unittests/Dialect/Affine/CMakeLists.txt b/mlir/unittests/Dialect/Affine/CMakeLists.txt
new file mode 100644
index 0000000000000..ccf5b25170ea1
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_mlir_unittest(MLIRAffineTests
+  MemRefDependenceGraphTest.cpp
+)
+mlir_target_link_libraries(MLIRAffineTests
+  PRIVATE
+  MLIRIR
+  MLIRAffineAnalysis
+  MLIRAffineDialect
+  MLIRArithDialect
+  MLIRFuncDialect
+)
diff --git a/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
new file mode 100644
index 0000000000000..69a5189959b82
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
@@ -0,0 +1,155 @@
+//===- MemRefDependenceGraphTest.cpp - MDG hasDependencePath tests --------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/Affine/Analysis/Utils.h"
+#include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/IR/Builders.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/MLIRContext.h"
+#include "mlir/IR/OwningOpRef.h"
+
+#include "gtest/gtest.h"
+
+using namespace mlir;
+using namespace mlir::affine;
+
+namespace {
+
+/// Test fixture that builds a tiny function with `numOps` top-level
+/// `arith.constant` ops in a single block. We use this scaffolding to obtain
+/// real `Operation *` pointers in a real `Block` so that the MDG's DFS, which
+/// uses `Operation::isBeforeInBlock`, has well-defined ordering.
+class MemRefDependenceGraphTest : public ::testing::Test {
+protected:
+  MemRefDependenceGraphTest() : builder(&context) {
+    context
+        .loadDialect<AffineDialect, arith::ArithDialect, func::FuncDialect>();
+  }
+
+  /// Builds a function with `numOps` `arith.constant` ops at the top of its
+  /// entry block and returns a vector of pointers to them in program order.
+  /// The returned ops outlive the test by virtue of being owned by `module`.
+  SmallVector<Operation *> buildLinearOps(unsigned numOps) {
+    Location loc = UnknownLoc::get(&context);
+    module = ModuleOp::create(loc);
+    builder.setInsertionPointToStart(module->getBody());
+    auto funcOp = func::FuncOp::create(builder, loc, "test",
+                                       builder.getFunctionType({}, {}));
+    Block *entry = funcOp.addEntryBlock();
+    builder.setInsertionPointToStart(entry);
+    SmallVector<Operation *> ops;
+    ops.reserve(numOps);
+    for (unsigned i = 0; i < numOps; ++i)
+      ops.push_back(arith::ConstantOp::create(builder, loc,
+                                              builder.getI32IntegerAttr(0)));
+    func::ReturnOp::create(builder, loc);
+    block = entry;
+    return ops;
+  }
+
+  MLIRContext context;
+  OpBuilder builder;
+  OwningOpRef<ModuleOp> module;
+  Block *block = nullptr;
+};
+
+/// Smoke test: `hasDependencePath` returns true for a direct edge and false
+/// when the destination is not reachable from the source.
+TEST_F(MemRefDependenceGraphTest, DirectAndUnreachable) {
+  SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/3);
+  MemRefDependenceGraph mdg(*block);
+  unsigned a = mdg.addNode(ops[0]);
+  unsigned b = mdg.addNode(ops[1]);
+  unsigned c = mdg.addNode(ops[2]);
+  Value v = ops[0]->getResult(0);
+  mdg.addEdge(a, b, v);
+
+  EXPECT_TRUE(mdg.hasDependencePath(a, a));
+  EXPECT_TRUE(mdg.hasDependencePath(a, b));
+  EXPECT_FALSE(mdg.hasDependencePath(a, c));
+  EXPECT_FALSE(mdg.hasDependencePath(b, a));
+}
+
+/// `hasDependencePath` correctly traverses a chain of edges.
+TEST_F(MemRefDependenceGraphTest, TransitiveChain) {
+  SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/4);
+  MemRefDependenceGraph mdg(*block);
+  SmallVector<unsigned, 4> ids;
+  for (Operation *op : ops)
+    ids.push_back(mdg.addNode(op));
+  Value v = ops[0]->getResult(0);
+  for (unsigned i = 0; i + 1 < ids.size(); ++i)
+    mdg.addEdge(ids[i], ids[i + 1], v);
+
+  EXPECT_TRUE(mdg.hasDependencePath(ids[0], ids[3]));
+  EXPECT_FALSE(mdg.hasDependencePath(ids[3], ids[0]));
+}
+
+/// Regression test for an exponential-time DFS in `hasDependencePath`. The
+/// MDG is built as a chain of `kLayers` diamond merges:
+///
+///         layer 0          layer 1           layer kLayers-1
+///   src → A0,B0  → mid0 → A1,B1  → mid1 → … → midN-1     unreachable
+///
+/// The chain has `2^kLayers` distinct paths leading from `src` to the final
+/// merge, but the query node `unreachable` has no incoming edge from the
+/// chain. Before the visited-set fix, the DFS pushed each chain node onto
+/// the worklist once per path that reached it — i.e. exponentially in
+/// `kLayers` — and only returned `false` once every path had been walked.
+/// `affine-loop-fusion` hung at 99% CPU indefinitely on real workloads with
+/// this shape. With the fix, the DFS visits each node at most once, so the
+/// query is bounded by O(V + E).
+///
+/// At `kLayers = 30` the unfixed code enumerates ~10^9 paths and runs for
+/// far longer than any reasonable test timeout; the patched code completes
+/// the query in well under a millisecond.
+TEST_F(MemRefDependenceGraphTest, DiamondMergesNoExponentialBlowup) {
+  constexpr unsigned kLayers = 30;
+  // Per layer: two parallel nodes (A, B) plus one merge node. Plus one node
+  // each for src and a separate, deliberately unreachable destination
+  // placed last in program order.
+  constexpr unsigned kNumOps = 2 + 3 * kLayers;
+  SmallVector<Operation *> ops = buildLinearOps(kNumOps);
+
+  MemRefDependenceGraph mdg(*block);
+  SmallVector<unsigned, kNumOps> ids;
+  for (Operation *op : ops)
+    ids.push_back(mdg.addNode(op));
+  Value v = ops[0]->getResult(0);
+
+  // Wire up the diamond chain. `prev` is the merge node of the previous
+  // layer (or src for the first layer); each layer routes prev → {A, B} →
+  // mid, and mid becomes the next layer's prev. The chain's final `mid`
+  // has no edge to the unreachable node.
+  unsigned src = ids.front();
+  unsigned unreachable = ids.back();
+  unsigned prev = src;
+  for (unsigned layer = 0; layer < kLayers; ++layer) {
+    unsigned a = ids[1 + 3 * layer];
+    unsigned b = ids[2 + 3 * layer];
+    unsigned mid = ids[3 + 3 * layer];
+    mdg.addEdge(prev, a, v);
+    mdg.addEdge(prev, b, v);
+    mdg.addEdge(a, mid, v);
+    mdg.addEdge(b, mid, v);
+    prev = mid;
+  }
+  // Sanity: forward-reachability inside the chain works as expected.
+  EXPECT_TRUE(mdg.hasDependencePath(src, prev));
+
+  // The exponential-blowup query: dst is not in the chain's reachable set,
+  // so the DFS must walk the whole chain before concluding `false`. The
+  // assertion is that this call terminates AND returns the correct result;
+  // without the visited-set fix it does not return within any reasonable
+  // test timeout.
+  EXPECT_FALSE(mdg.hasDependencePath(src, unreachable));
+}
+
+} // namespace
diff --git a/mlir/unittests/Dialect/CMakeLists.txt b/mlir/unittests/Dialect/CMakeLists.txt
index 269eccb1f93c3..deeb58cd0b110 100644
--- a/mlir/unittests/Dialect/CMakeLists.txt
+++ b/mlir/unittests/Dialect/CMakeLists.txt
@@ -7,6 +7,7 @@ mlir_target_link_libraries(MLIRDialectTests
   MLIRDialect)
 
 add_subdirectory(AMDGPU)
+add_subdirectory(Affine)
 add_subdirectory(ArmSME)
 add_subdirectory(Index)
 add_subdirectory(Linalg)



More information about the Mlir-commits mailing list