[Mlir-commits] [mlir] [mlir][dataflow] Update dataflow tutorial doc and add dataflow example code (PR #149296)

Oleksandr Alex Zinenko llvmlistbot at llvm.org
Thu Aug 28 12:59:07 PDT 2025


================
@@ -5,20 +5,361 @@ daunting and/or complex. A dataflow analysis generally involves propagating
 information about the IR across various different types of control flow
 constructs, of which MLIR has many (Block-based branches, Region-based branches,
 CallGraph, etc), and it isn't always clear how best to go about performing the
-propagation. To help writing these types of analyses in MLIR, this document
-details several utilities that simplify the process and make it a bit more
-approachable.
+propagation. Dataflow analyses often require implementing fixed-point iteration
+when data dependencies form cycles, as can happen with control-flow. Tracking
+dependencies and making sure updates are properly propagated can get quite
+difficult when writing complex analyses. That is why MLIR provides a framework
+for writing general dataflow analyses as well as several utilities to streamline
+the implementation of common analyses. The code and test from this tutorial can 
+be found in `mlir/examples/dataflow`.
+
+## DataFlow Analysis Framework
+
+MLIR provides a general dataflow analysis framework for building fixed-point
+iteration dataflow analyses with ease and utilities for common dataflow
+analyses. Because the landscape of IRs in MLIR can be vast, the framework is
+designed to be extensible and composable, so that utilities can be shared across
+dialects with different semantics as much as possible. The framework also tries
+to make debugging dataflow analyses easy by providing (hopefully) insightful
+logs with `-debug-only="dataflow"`.
+
+Suppose we want to compute at compile-time the constant-valued results of
+operations. For example, consider:
+
+```mlir
+%0 = string.constant "foo"
+%1 = string.constant "bar"
+%2 = string.concat %0, %1
+```
+We can determine with the information in the IR at compile time the value of
+`%2` to be "foobar". This is called constant propagation. In MLIR's dataflow
+analysis framework, this is in general called the "analysis state of a program
+point"; the "state" being, in this case, the constant value, and the "program
+point" being the SSA value `%2`.
+
+The constant value state of an SSA value is implemented as a subclass of
+`AnalysisState`, and program points are represented by the `ProgramPoint` union,
+which can be operations, SSA values, or blocks. They can also be just about
+anything, see [Extending ProgramPoint](#extending-programpoint). In general, an
+analysis state represents information about the IR computed by an analysis. 
+
+Let us define an analysis state to represent a compile time known string value
+of an SSA value:
+
+```c++
+class StringConstant : public AnalysisState {
+  /// This is the known string constant value of an SSA value at compile time
+  /// as determined by a dataflow analysis. To implement the concept of being
+  /// "uninitialized", the potential string value is wrapped in an `Optional`
+  /// and set to `None` by default to indicate that no value has been provided.
+  std::optional<std::string> stringValue = std::nullopt;
+
+public:
+  using AnalysisState::AnalysisState;
+
+  /// Return true if no value has been provided for the string constant value.
+  bool isUninitialized() const { return !stringValue.has_value(); }
+
+  /// Default initialized the state to an empty string. Return whether the value
+  /// of the state has changed.
+  ChangeResult defaultInitialize() {
+    // If the state already has a value, do nothing.
+    if (!isUninitialized())
+      return ChangeResult::NoChange;
+    // Initialize the state and indicate that its value changed.
+    stringValue = "";
+    return ChangeResult::Change;
+  }
+
+  /// Get the currently known string value.
+  StringRef getStringValue() const {
+    assert(!isUninitialized() && "getting the value of an uninitialized state");
+    return stringValue.value();
+  }
+
+  /// "Join" the value of the state with another constant.
+  ChangeResult join(const Twine &value) {
+    // If the current state is uninitialized, just take the value.
+    if (isUninitialized()) {
+      stringValue = value.str();
+      return ChangeResult::Change;
+    }
+    // If the current state is "overdefined", no new information can be taken.
+    if (stringValue->empty())
+      return ChangeResult::NoChange;
+    // If the current state has a different value, it now has two conflicting
+    // values and should go to overdefined.
+    if (stringValue != value.str()) {
+      stringValue = "";
+      return ChangeResult::Change;
+    }
+    return ChangeResult::NoChange;
+  }
+
+  /// Print the constant value.
+  void print(raw_ostream &os) const override {
+    os << stringValue.value_or("") << "\n";
+  }
+};
+```
+
+Analysis states often depend on each other. In our example, the constant value
+of `%2` depends on that of `%0` and `%1`. It stands to reason that the constant
+value of `%2` needs to be recomputed when that of `%0` and `%1` change. The
+`DataFlowSolver` implements the fixed-point iteration algorithm and manages the
+dependency graph between analysis states.
+
+The computation of analysis states, on the other hand, is performed by dataflow
+analyses, subclasses of `DataFlowAnalysis`. A dataflow analysis has to implement
+a "transfer function", that is, code that computes the values of some states
+using the values of others, and set up the dependency graph correctly. Since the
+dependency graph inside the solver is initially empty, it must also set up the
+dependency graph.
+
+```c++
+class DataFlowAnalysis {
+public:
+  /// "Visit" the provided program point. This method is typically used to
+  /// implement transfer functions on or across program points.
+  virtual LogicalResult visit(ProgramPoint point) = 0;
+
+  /// Initialize the dependency graph required by this analysis from the given
+  /// top-level operation. This function is called once by the solver before
+  /// running the fixed-point iteration algorithm.
+  virtual LogicalResult initialize(Operation *top) = 0;
+
+protected:
+  /// Create a dependency between the given analysis state and lattice anchor
+  /// on this analysis.
+  void addDependency(AnalysisState *state, ProgramPoint *point);
+
+  /// Propagate an update to a state if it changed.
+  void propagateIfChanged(AnalysisState *state, ChangeResult changed);
+
+  /// Get the analysis state associated with the lattice anchor. The returned
+  /// state is expected to be "write-only", and any updates need to be
+  /// propagated by `propagateIfChanged`.
+  template <typename StateT, typename AnchorT>
+  StateT *getOrCreate(AnchorT anchor) {
+    return solver.getOrCreateState<StateT>(anchor);
+  }
+};
+```
+
+Dependency management is a little unusual in this framework. The dependents of
+the value of a state are not other states but invocations of dataflow analyses
+on certain program points. For example:
+
+```c++
+class StringConstantPropagation : public DataFlowAnalysis {
+public:
+  /// Implement the transfer function for string operations. When visiting a
+  /// string operation, this analysis will try to determine compile time values
+  /// of the operation's results and set them in `StringConstant` states. This
+  /// function is invoked on an operation whenever the states of its operands
+  /// are changed.
+  LogicalResult visit(ProgramPoint point) override {
+    // This function expects only to receive operations.
+    auto *op = point->getPrevOp();
+
+    // Get or create the constant string values of the operands.
+    SmallVector<StringConstant *> operandValues;
+    for (Value operand : op->getOperands()) {
+      auto *value = getOrCreate<StringConstant>(operand);
+      // Create a dependency from the state to this analysis. When the string
+      // value of one of the operation's operands are updated, invoke the
+      // transfer function again.
+      addDependency(value, point);
+      // If the state is uninitialized, bail out and come back later when it is
+      // initialized.
+      if (value->isUninitialized())
+        return success();
+      operandValues.push_back(value);
+    }
+
+    // Try to compute a constant value of the result.
+    auto *result = getOrCreate<StringConstant>(op->getResult(0));
+    if (auto constant = dyn_cast<string::ConstantOp>(op)) {
+      // Just grab and set the constant value of the result of the operation.
+      // Propagate an update to the state if it changed.
+      propagateIfChanged(result, result->join(constant.getValue()));
+    } else if (auto concat = dyn_cast<string::ConcatOp>(op)) {
+      StringRef lhs = operandValues[0]->getStringValue();
+      StringRef rhs = operandValues[1]->getStringValue();
+      // If either operand is overdefined, the results are overdefined.
+      if (lhs.empty() || rhs.empty()) {
+        propagateIfChanged(result, result->defaultInitialize());
+
+        // Otherwise, compute the constant value and join it with the result.
+      } else {
+        propagateIfChanged(result, result->join(lhs + rhs));
+      }
+    } else {
+      // We don't know how to implement the transfer function for this
+      // operation. Mark its results as overdefined.
+      propagateIfChanged(result, result->defaultInitialize());
+    }
+    return success();
+  }
+};
+```
+
+In the above example, the `visit` function sets up the dependencies of the
+analysis invocation on an operation as the constant values of the operands of
+each operation. When the operand states have initialized values but overdefined
+values, it sets the state of the result to overdefined. Otherwise, it computes
+the state of the result and merges the new information in with `join`.
+
+However, the dependency graph still needs to be initialized before the solver
+knows what to call `visit` on. This is done in the `initialize` function:
+
+```c++
+LogicalResult StringConstantPropagation::initialize(Operation *top) {
+  // Visit every nested string operation and set up its dependencies.
+  top->walk([&](Operation *op) {
+    for (Value operand : op->getOperands()) {
+      auto *state = getOrCreate<StringConstant>(operand);
----------------
ftynse wrote:

What if the operand is not of string type?

https://github.com/llvm/llvm-project/pull/149296


More information about the Mlir-commits mailing list