[Mlir-commits] [mlir] [mlir][dataflow] Update dataflow tutorial doc and add dataflow example code (PR #149296)
Oleksandr Alex Zinenko
llvmlistbot at llvm.org
Thu Aug 28 12:59:07 PDT 2025
================
@@ -5,20 +5,361 @@ daunting and/or complex. A dataflow analysis generally involves propagating
information about the IR across various different types of control flow
constructs, of which MLIR has many (Block-based branches, Region-based branches,
CallGraph, etc), and it isn't always clear how best to go about performing the
-propagation. To help writing these types of analyses in MLIR, this document
-details several utilities that simplify the process and make it a bit more
-approachable.
+propagation. Dataflow analyses often require implementing fixed-point iteration
+when data dependencies form cycles, as can happen with control-flow. Tracking
+dependencies and making sure updates are properly propagated can get quite
+difficult when writing complex analyses. That is why MLIR provides a framework
+for writing general dataflow analyses as well as several utilities to streamline
+the implementation of common analyses. The code and test from this tutorial can
+be found in `mlir/examples/dataflow`.
+
+## DataFlow Analysis Framework
+
+MLIR provides a general dataflow analysis framework for building fixed-point
+iteration dataflow analyses with ease and utilities for common dataflow
+analyses. Because the landscape of IRs in MLIR can be vast, the framework is
+designed to be extensible and composable, so that utilities can be shared across
+dialects with different semantics as much as possible. The framework also tries
+to make debugging dataflow analyses easy by providing (hopefully) insightful
+logs with `-debug-only="dataflow"`.
+
+Suppose we want to compute at compile-time the constant-valued results of
+operations. For example, consider:
+
+```mlir
+%0 = string.constant "foo"
+%1 = string.constant "bar"
+%2 = string.concat %0, %1
+```
+We can determine with the information in the IR at compile time the value of
+`%2` to be "foobar". This is called constant propagation. In MLIR's dataflow
+analysis framework, this is in general called the "analysis state of a program
+point"; the "state" being, in this case, the constant value, and the "program
+point" being the SSA value `%2`.
+
+The constant value state of an SSA value is implemented as a subclass of
+`AnalysisState`, and program points are represented by the `ProgramPoint` union,
+which can be operations, SSA values, or blocks. They can also be just about
+anything, see [Extending ProgramPoint](#extending-programpoint). In general, an
+analysis state represents information about the IR computed by an analysis.
+
+Let us define an analysis state to represent a compile time known string value
+of an SSA value:
+
+```c++
+class StringConstant : public AnalysisState {
+ /// This is the known string constant value of an SSA value at compile time
+ /// as determined by a dataflow analysis. To implement the concept of being
+ /// "uninitialized", the potential string value is wrapped in an `Optional`
+ /// and set to `None` by default to indicate that no value has been provided.
+ std::optional<std::string> stringValue = std::nullopt;
+
+public:
+ using AnalysisState::AnalysisState;
+
+ /// Return true if no value has been provided for the string constant value.
+ bool isUninitialized() const { return !stringValue.has_value(); }
+
+ /// Default initialized the state to an empty string. Return whether the value
+ /// of the state has changed.
+ ChangeResult defaultInitialize() {
+ // If the state already has a value, do nothing.
+ if (!isUninitialized())
+ return ChangeResult::NoChange;
+ // Initialize the state and indicate that its value changed.
+ stringValue = "";
+ return ChangeResult::Change;
+ }
+
+ /// Get the currently known string value.
+ StringRef getStringValue() const {
+ assert(!isUninitialized() && "getting the value of an uninitialized state");
+ return stringValue.value();
+ }
+
+ /// "Join" the value of the state with another constant.
+ ChangeResult join(const Twine &value) {
+ // If the current state is uninitialized, just take the value.
+ if (isUninitialized()) {
+ stringValue = value.str();
+ return ChangeResult::Change;
+ }
+ // If the current state is "overdefined", no new information can be taken.
+ if (stringValue->empty())
+ return ChangeResult::NoChange;
+ // If the current state has a different value, it now has two conflicting
+ // values and should go to overdefined.
+ if (stringValue != value.str()) {
+ stringValue = "";
+ return ChangeResult::Change;
+ }
+ return ChangeResult::NoChange;
+ }
+
+ /// Print the constant value.
+ void print(raw_ostream &os) const override {
+ os << stringValue.value_or("") << "\n";
+ }
+};
+```
+
+Analysis states often depend on each other. In our example, the constant value
+of `%2` depends on that of `%0` and `%1`. It stands to reason that the constant
+value of `%2` needs to be recomputed when that of `%0` and `%1` change. The
+`DataFlowSolver` implements the fixed-point iteration algorithm and manages the
+dependency graph between analysis states.
+
+The computation of analysis states, on the other hand, is performed by dataflow
+analyses, subclasses of `DataFlowAnalysis`. A dataflow analysis has to implement
+a "transfer function", that is, code that computes the values of some states
+using the values of others, and set up the dependency graph correctly. Since the
+dependency graph inside the solver is initially empty, it must also set up the
+dependency graph.
+
+```c++
+class DataFlowAnalysis {
+public:
+ /// "Visit" the provided program point. This method is typically used to
+ /// implement transfer functions on or across program points.
+ virtual LogicalResult visit(ProgramPoint point) = 0;
+
+ /// Initialize the dependency graph required by this analysis from the given
+ /// top-level operation. This function is called once by the solver before
+ /// running the fixed-point iteration algorithm.
+ virtual LogicalResult initialize(Operation *top) = 0;
+
+protected:
+ /// Create a dependency between the given analysis state and lattice anchor
+ /// on this analysis.
+ void addDependency(AnalysisState *state, ProgramPoint *point);
+
+ /// Propagate an update to a state if it changed.
+ void propagateIfChanged(AnalysisState *state, ChangeResult changed);
+
+ /// Get the analysis state associated with the lattice anchor. The returned
+ /// state is expected to be "write-only", and any updates need to be
+ /// propagated by `propagateIfChanged`.
+ template <typename StateT, typename AnchorT>
+ StateT *getOrCreate(AnchorT anchor) {
+ return solver.getOrCreateState<StateT>(anchor);
+ }
+};
+```
+
+Dependency management is a little unusual in this framework. The dependents of
+the value of a state are not other states but invocations of dataflow analyses
+on certain program points. For example:
+
+```c++
+class StringConstantPropagation : public DataFlowAnalysis {
+public:
+ /// Implement the transfer function for string operations. When visiting a
+ /// string operation, this analysis will try to determine compile time values
+ /// of the operation's results and set them in `StringConstant` states. This
+ /// function is invoked on an operation whenever the states of its operands
+ /// are changed.
+ LogicalResult visit(ProgramPoint point) override {
+ // This function expects only to receive operations.
+ auto *op = point->getPrevOp();
+
+ // Get or create the constant string values of the operands.
+ SmallVector<StringConstant *> operandValues;
+ for (Value operand : op->getOperands()) {
+ auto *value = getOrCreate<StringConstant>(operand);
+ // Create a dependency from the state to this analysis. When the string
+ // value of one of the operation's operands are updated, invoke the
+ // transfer function again.
+ addDependency(value, point);
+ // If the state is uninitialized, bail out and come back later when it is
+ // initialized.
+ if (value->isUninitialized())
+ return success();
+ operandValues.push_back(value);
+ }
+
+ // Try to compute a constant value of the result.
+ auto *result = getOrCreate<StringConstant>(op->getResult(0));
+ if (auto constant = dyn_cast<string::ConstantOp>(op)) {
+ // Just grab and set the constant value of the result of the operation.
+ // Propagate an update to the state if it changed.
+ propagateIfChanged(result, result->join(constant.getValue()));
+ } else if (auto concat = dyn_cast<string::ConcatOp>(op)) {
+ StringRef lhs = operandValues[0]->getStringValue();
+ StringRef rhs = operandValues[1]->getStringValue();
+ // If either operand is overdefined, the results are overdefined.
+ if (lhs.empty() || rhs.empty()) {
+ propagateIfChanged(result, result->defaultInitialize());
+
+ // Otherwise, compute the constant value and join it with the result.
+ } else {
+ propagateIfChanged(result, result->join(lhs + rhs));
+ }
+ } else {
+ // We don't know how to implement the transfer function for this
+ // operation. Mark its results as overdefined.
+ propagateIfChanged(result, result->defaultInitialize());
+ }
+ return success();
+ }
+};
+```
+
+In the above example, the `visit` function sets up the dependencies of the
+analysis invocation on an operation as the constant values of the operands of
+each operation. When the operand states have initialized values but overdefined
+values, it sets the state of the result to overdefined. Otherwise, it computes
+the state of the result and merges the new information in with `join`.
+
+However, the dependency graph still needs to be initialized before the solver
+knows what to call `visit` on. This is done in the `initialize` function:
+
+```c++
+LogicalResult StringConstantPropagation::initialize(Operation *top) {
+ // Visit every nested string operation and set up its dependencies.
+ top->walk([&](Operation *op) {
+ for (Value operand : op->getOperands()) {
+ auto *state = getOrCreate<StringConstant>(operand);
----------------
ftynse wrote:
What if the operand is not of string type?
https://github.com/llvm/llvm-project/pull/149296
More information about the Mlir-commits
mailing list