[Mlir-commits] [mlir] [mlir][ControlFlow] Improve time complexity of RegionBranchOpInterface canonicalization patterns (PR #186114)

Fri Mar 13 05:31:18 PDT 2026

================
@@ -1006,39 +993,63 @@ struct RemoveDuplicateSuccessorInputUses : public RewritePattern {
       return getArgOrResultNumber(a) < getArgOrResultNumber(b);
     });
 
-    // Check every distinct pair of successor inputs for duplicates. Replace
-    // `input2` with `input1` if they are duplicates.
+    // Group inputs by their operand "signature" to find duplicates. Two
+    // successor inputs are duplicates if each predecessor (region branch point)
+    // forwards the same value for both. Let n = number of successor inputs and
+    // k = number of predecessors per input. Instead of comparing every pair of
+    // inputs (O(n² * k)), we build a signature for each input and group them
+    // via a std::map.
+    //
+    // A signature is a sorted list of (predecessor, forwarded value) pairs.
+    // Within each group, all but the first (canonical) input are replaced with
+    // the canonical one.
+    using SigEntry = std::pair<Operation *, Value>;
+    using Signature = SmallVector<SigEntry>;
+    auto sigEntryLess = [](const SigEntry &a, const SigEntry &b) {
+      if (a.first != b.first)
+        return a.first < b.first;
+      return a.second.getAsOpaquePointer() < b.second.getAsOpaquePointer();
+    };
+    // The map key is (signature, owner). Two inputs are duplicates only if they
+    // have the same signature AND the same owner (block or defining op). This
+    // ensures we track one canonical per owner group.
+    using MapKey = std::pair<Signature, void *>;
+    auto mapKeyLess = [&](const MapKey &a, const MapKey &b) {
+      if (a.second != b.second)
+        return a.second < b.second;
+      return std::lexicographical_compare(a.first.begin(), a.first.end(),
+                                          b.first.begin(), b.first.end(),
+                                          sigEntryLess);
+    };
+    std::map<MapKey, Value, decltype(mapKeyLess)> signatureToCanonical(
----------------
yangtetris wrote:

I chose `std::map` for two reasons:

1. Using `DenseMap` or `unordered_map` requires implementing a hash function for `SmallVector<std::pair<Operation *, Value>>`, which is slightly more complex (~10 extra lines) than providing a comparator.
2. In this scenario, _n_ is typically small, so the O(k log n) lookup cost of `std::map` is comparable to O(k) hashing — the constant factor of computing a hash can outweigh the log n factor. I benchmarked with n=60 and `std::map` was slightly faster than a hash map.

That said, both reasons are minor and I'm open to changing this. If you'd prefer to stay consistent with LLVM ADT, I can switch to SmallDenseMap. WDYT?

https://github.com/llvm/llvm-project/pull/186114