[Mlir-commits] [mlir] [mlir][affine] Add visited set to hasDependencePath (PR #199011)

llvmlistbot at llvm.org llvmlistbot at llvm.org
Thu May 21 05:06:20 PDT 2026


llvmorg-github-actions[bot] wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-mlir

Author: Ryan Kim (chokobole)

<details>
<summary>Changes</summary>

`MemRefDependenceGraph::hasDependencePath` performs a DFS with an
explicit worklist but no visited set. The MDG is a DAG (edges respect
program order), so every diamond merge re-enumerates the paths through
it — worst case `O(2^V)` in node count. With many independent memref
accesses on the same buffer (e.g. fully-unrolled SIMD-like bodies that
load and store from one buffer between several `affine.for` loops),
`affine-loop-fusion` hangs at 99% CPU indefinitely inside this function.

This patch adds a `DenseSet<unsigned>` tracking nodes already pushed onto
the worklist, converting the worst case to `O(V + E)`. Behavior on
inputs the pass already handled is preserved: the search still returns
`true` exactly when at least one path existed, because every reachable
node is still visited at least once — the visited set only suppresses
re-pushes of nodes already on the worklist.

### Regression test

`mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp` — a new
focused C++ unit test for `MemRefDependenceGraph`. Three cases:

- `DirectAndUnreachable` and `TransitiveChain` cover the basic
  reachability contract.
- `DiamondMergesNoExponentialBlowup` is the regression case. It
  constructs a 30-layer chain of diamond merges (`src → A_k,B_k → mid_k`
  per layer, with `mid_kLayers−1` as the chain's terminus) — `2^30`
  distinct paths from `src` to the final merge. The query asks whether
  `src` can reach a deliberately-unreachable node placed after the
  chain in program order, which is the worst case for the DFS: it must
  walk every path before concluding `false`. Without the visited set
  the call does not return within a 30 s observed timeout; with the
  visited set it passes in under a millisecond.

I considered also adding a lit-test reproducer that runs
`affine-loop-fusion` end-to-end on a fully-unrolled Poseidon2 permute
(the real workload where this first showed up), but the smallest input
that deterministically reproduced the hang was still ~4 kLOC of MLIR.
The unit test gives the same regression coverage in 150 lines and
exercises the patched function directly. Happy to add the lit test as
well if reviewers prefer.

---
Full diff: https://github.com/llvm/llvm-project/pull/199011.diff


4 Files Affected:

- (modified) mlir/lib/Dialect/Affine/Analysis/Utils.cpp (+10-1) 
- (added) mlir/unittests/Dialect/Affine/CMakeLists.txt (+11) 
- (added) mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp (+154) 
- (modified) mlir/unittests/Dialect/CMakeLists.txt (+1) 


``````````diff
diff --git a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
index ebe932a14694a..b747044e9c288 100644
--- a/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Analysis/Utils.cpp
@@ -574,6 +574,14 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
   SmallVector<std::pair<unsigned, unsigned>, 4> worklist;
   worklist.push_back({srcId, 0});
   Operation *dstOp = getNode(dstId)->op;
+  // Track nodes already pushed onto the worklist. The MDG is a DAG (edges
+  // respect program order), but without a visited set this DFS enumerates
+  // every path from 'srcId' to every reachable node — exponential in the
+  // number of diamond merges. With many independent memref accesses on the
+  // same buffer (e.g. fully-unrolled SIMD-like bodies that load and store
+  // from one buffer between several affine.for loops), this is fatal.
+  DenseSet<unsigned> visited;
+  visited.insert(srcId);
   // Run DFS traversal to see if 'dstId' is reachable from 'srcId'.
   while (!worklist.empty()) {
     auto &idAndIndex = worklist.back();
@@ -595,7 +603,8 @@ bool MemRefDependenceGraph::hasDependencePath(unsigned srcId,
     // nodes that are "after" dstId in the containing block; one can't have a
     // path to `dstId` from any of those nodes.
     bool afterDst = dstOp->isBeforeInBlock(getNode(edge.id)->op);
-    if (!afterDst && edge.id != idAndIndex.first)
+    if (!afterDst && edge.id != idAndIndex.first &&
+        visited.insert(edge.id).second)
       worklist.push_back({edge.id, 0});
   }
   return false;
diff --git a/mlir/unittests/Dialect/Affine/CMakeLists.txt b/mlir/unittests/Dialect/Affine/CMakeLists.txt
new file mode 100644
index 0000000000000..ccf5b25170ea1
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/CMakeLists.txt
@@ -0,0 +1,11 @@
+add_mlir_unittest(MLIRAffineTests
+  MemRefDependenceGraphTest.cpp
+)
+mlir_target_link_libraries(MLIRAffineTests
+  PRIVATE
+  MLIRIR
+  MLIRAffineAnalysis
+  MLIRAffineDialect
+  MLIRArithDialect
+  MLIRFuncDialect
+)
diff --git a/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
new file mode 100644
index 0000000000000..9b7ec3f1f6631
--- /dev/null
+++ b/mlir/unittests/Dialect/Affine/MemRefDependenceGraphTest.cpp
@@ -0,0 +1,154 @@
+//===- MemRefDependenceGraphTest.cpp - MDG hasDependencePath tests --------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/Affine/Analysis/Utils.h"
+#include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/IR/Builders.h"
+#include "mlir/IR/BuiltinOps.h"
+#include "mlir/IR/MLIRContext.h"
+#include "mlir/IR/OwningOpRef.h"
+
+#include "gtest/gtest.h"
+
+using namespace mlir;
+using namespace mlir::affine;
+
+namespace {
+
+/// Test fixture that builds a tiny function with `numOps` top-level
+/// `arith.constant` ops in a single block. We use this scaffolding to obtain
+/// real `Operation *` pointers in a real `Block` so that the MDG's DFS, which
+/// uses `Operation::isBeforeInBlock`, has well-defined ordering.
+class MemRefDependenceGraphTest : public ::testing::Test {
+protected:
+  MemRefDependenceGraphTest() : builder(&context) {
+    context.loadDialect<AffineDialect, arith::ArithDialect, func::FuncDialect>();
+  }
+
+  /// Builds a function with `numOps` `arith.constant` ops at the top of its
+  /// entry block and returns a vector of pointers to them in program order.
+  /// The returned ops outlive the test by virtue of being owned by `module`.
+  SmallVector<Operation *> buildLinearOps(unsigned numOps) {
+    Location loc = UnknownLoc::get(&context);
+    module = ModuleOp::create(loc);
+    builder.setInsertionPointToStart(module->getBody());
+    auto funcOp = func::FuncOp::create(builder, loc, "test",
+                                       builder.getFunctionType({}, {}));
+    Block *entry = funcOp.addEntryBlock();
+    builder.setInsertionPointToStart(entry);
+    SmallVector<Operation *> ops;
+    ops.reserve(numOps);
+    for (unsigned i = 0; i < numOps; ++i)
+      ops.push_back(arith::ConstantOp::create(builder, loc,
+                                              builder.getI32IntegerAttr(0)));
+    func::ReturnOp::create(builder, loc);
+    block = entry;
+    return ops;
+  }
+
+  MLIRContext context;
+  OpBuilder builder;
+  OwningOpRef<ModuleOp> module;
+  Block *block = nullptr;
+};
+
+/// Smoke test: `hasDependencePath` returns true for a direct edge and false
+/// when the destination is not reachable from the source.
+TEST_F(MemRefDependenceGraphTest, DirectAndUnreachable) {
+  SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/3);
+  MemRefDependenceGraph mdg(*block);
+  unsigned a = mdg.addNode(ops[0]);
+  unsigned b = mdg.addNode(ops[1]);
+  unsigned c = mdg.addNode(ops[2]);
+  Value v = ops[0]->getResult(0);
+  mdg.addEdge(a, b, v);
+
+  EXPECT_TRUE(mdg.hasDependencePath(a, a));
+  EXPECT_TRUE(mdg.hasDependencePath(a, b));
+  EXPECT_FALSE(mdg.hasDependencePath(a, c));
+  EXPECT_FALSE(mdg.hasDependencePath(b, a));
+}
+
+/// `hasDependencePath` correctly traverses a chain of edges.
+TEST_F(MemRefDependenceGraphTest, TransitiveChain) {
+  SmallVector<Operation *> ops = buildLinearOps(/*numOps=*/4);
+  MemRefDependenceGraph mdg(*block);
+  SmallVector<unsigned, 4> ids;
+  for (Operation *op : ops)
+    ids.push_back(mdg.addNode(op));
+  Value v = ops[0]->getResult(0);
+  for (unsigned i = 0; i + 1 < ids.size(); ++i)
+    mdg.addEdge(ids[i], ids[i + 1], v);
+
+  EXPECT_TRUE(mdg.hasDependencePath(ids[0], ids[3]));
+  EXPECT_FALSE(mdg.hasDependencePath(ids[3], ids[0]));
+}
+
+/// Regression test for an exponential-time DFS in `hasDependencePath`. The
+/// MDG is built as a chain of `kLayers` diamond merges:
+///
+///         layer 0          layer 1           layer kLayers-1
+///   src → A0,B0  → mid0 → A1,B1  → mid1 → … → midN-1     unreachable
+///
+/// The chain has `2^kLayers` distinct paths leading from `src` to the final
+/// merge, but the query node `unreachable` has no incoming edge from the
+/// chain. Before the visited-set fix, the DFS pushed each chain node onto
+/// the worklist once per path that reached it — i.e. exponentially in
+/// `kLayers` — and only returned `false` once every path had been walked.
+/// `affine-loop-fusion` hung at 99% CPU indefinitely on real workloads with
+/// this shape. With the fix, the DFS visits each node at most once, so the
+/// query is bounded by O(V + E).
+///
+/// At `kLayers = 30` the unfixed code enumerates ~10^9 paths and runs for
+/// far longer than any reasonable test timeout; the patched code completes
+/// the query in well under a millisecond.
+TEST_F(MemRefDependenceGraphTest, DiamondMergesNoExponentialBlowup) {
+  constexpr unsigned kLayers = 30;
+  // Per layer: two parallel nodes (A, B) plus one merge node. Plus one node
+  // each for src and a separate, deliberately unreachable destination
+  // placed last in program order.
+  constexpr unsigned kNumOps = 2 + 3 * kLayers;
+  SmallVector<Operation *> ops = buildLinearOps(kNumOps);
+
+  MemRefDependenceGraph mdg(*block);
+  SmallVector<unsigned, kNumOps> ids;
+  for (Operation *op : ops)
+    ids.push_back(mdg.addNode(op));
+  Value v = ops[0]->getResult(0);
+
+  // Wire up the diamond chain. `prev` is the merge node of the previous
+  // layer (or src for the first layer); each layer routes prev → {A, B} →
+  // mid, and mid becomes the next layer's prev. The chain's final `mid`
+  // has no edge to the unreachable node.
+  unsigned src = ids.front();
+  unsigned unreachable = ids.back();
+  unsigned prev = src;
+  for (unsigned layer = 0; layer < kLayers; ++layer) {
+    unsigned a = ids[1 + 3 * layer];
+    unsigned b = ids[2 + 3 * layer];
+    unsigned mid = ids[3 + 3 * layer];
+    mdg.addEdge(prev, a, v);
+    mdg.addEdge(prev, b, v);
+    mdg.addEdge(a, mid, v);
+    mdg.addEdge(b, mid, v);
+    prev = mid;
+  }
+  // Sanity: forward-reachability inside the chain works as expected.
+  EXPECT_TRUE(mdg.hasDependencePath(src, prev));
+
+  // The exponential-blowup query: dst is not in the chain's reachable set,
+  // so the DFS must walk the whole chain before concluding `false`. The
+  // assertion is that this call terminates AND returns the correct result;
+  // without the visited-set fix it does not return within any reasonable
+  // test timeout.
+  EXPECT_FALSE(mdg.hasDependencePath(src, unreachable));
+}
+
+} // namespace
diff --git a/mlir/unittests/Dialect/CMakeLists.txt b/mlir/unittests/Dialect/CMakeLists.txt
index 269eccb1f93c3..deeb58cd0b110 100644
--- a/mlir/unittests/Dialect/CMakeLists.txt
+++ b/mlir/unittests/Dialect/CMakeLists.txt
@@ -7,6 +7,7 @@ mlir_target_link_libraries(MLIRDialectTests
   MLIRDialect)
 
 add_subdirectory(AMDGPU)
+add_subdirectory(Affine)
 add_subdirectory(ArmSME)
 add_subdirectory(Index)
 add_subdirectory(Linalg)

``````````

</details>


https://github.com/llvm/llvm-project/pull/199011


More information about the Mlir-commits mailing list