[Mlir-commits] [mlir] [mlir][NFC] Move and improve ownership-based buffer dellocation docs (PR #89196)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Apr 18 03:08:27 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir
Author: Matthias Springer (matthias-springer)
<details>
<summary>Changes</summary>
Move the documentation of the ownership-based buffer deallocation pass to a separate file. Also improve the documentation a bit and insert a figure that explains the `bufferization.dealloc` op (copied from the tutorial at the LLVM Dev Summit 2023).
---
Patch is 482.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89196.diff
4 Files Affected:
- (modified) mlir/docs/BufferDeallocationInternals.md (+3)
- (modified) mlir/docs/Bufferization.md (+1-652)
- (added) mlir/docs/OwnershipBasedBufferDeallocation.md (+619)
- (added) mlir/docs/includes/img/bufferization_dealloc_op.svg (+1)
``````````diff
diff --git a/mlir/docs/BufferDeallocationInternals.md b/mlir/docs/BufferDeallocationInternals.md
index 3078cfbf593382..00830ba9d2dc2e 100644
--- a/mlir/docs/BufferDeallocationInternals.md
+++ b/mlir/docs/BufferDeallocationInternals.md
@@ -1,5 +1,8 @@
# Buffer Deallocation - Internals
+**Note:** This pass is deprecated. Please use the ownership-based buffer
+deallocation pass instead.
+
This section covers the internal functionality of the BufferDeallocation
transformation. The transformation consists of several passes. The main pass
called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index 3e7edeeabed191..68bab255a3dd86 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -110,7 +110,7 @@ is inserting an element inside a vector. Since SSA values are immutable, the
operation returns a copy of the input vector with the element inserted.
Another example in MLIR is `linalg.generic`, which always has an extra `outs`
operand which provides the initial values to update (for example when the
-operation is doing a reduction).
+operation is doing a reduction).
This input is referred to as "destination" in the following (quotes are
important as this operand isn't modified in place but copied) and comes into
@@ -240,657 +240,6 @@ Alternatively,
skips the analysis and inserts a copy on every buffer write, just like the
dialect conversion-based bufferization.
-## Buffer Deallocation
-
-**Important: this pass is deprecated, please use the ownership based buffer**
-**deallocation pass instead**
-
-One-Shot Bufferize deallocates all buffers that it allocates. This is in
-contrast to the dialect conversion-based bufferization that delegates this job
-to the
-[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
-pass. By default, One-Shot Bufferize rejects IR where a newly allocated buffer
-is returned from a block. Such IR will fail bufferization.
-
-A new buffer allocation is returned from a block when the result of an op that
-is not in destination-passing style is returned. E.g.:
-
-```mlir
-%0 = scf.if %c -> (tensor<?xf32>) {
- %1 = tensor.generate ... -> tensor<?xf32>
- scf.yield %1 : tensor<?xf32>
-} else {
- scf.yield %another_tensor : tensor<?xf32>
-}
-```
-
-The `scf.yield` in the "else" branch is OK, but the `scf.yield` in the "then"
-branch will be rejected.
-
-Another case in which a buffer allocation may be returned is when a buffer copy
-must be inserted due to a RaW conflict. E.g.:
-
-```mlir
-%0 = scf.if %c -> (tensor<?xf32>) {
- %1 = tensor.insert %cst into %another_tensor[%idx] : tensor<?xf32>
- "my_dialect.reading_tensor_op"(%another_tensor) : (tensor<?xf32>) -> ()
- ...
- scf.yield %1 : tensor<?xf32>
-} else {
- scf.yield %yet_another_tensor : tensor<?xf32>
-}
-```
-
-In the above example, a buffer copy of `buffer(%another_tensor)` (with `%cst`
-inserted) is yielded from the "then" branch.
-
-Note: Buffer allocations that are returned from a function are not deallocated.
-It is the caller's responsibility to deallocate the buffer. For the full
-function boundary ABI for MemRefs w.r.t. buffer deallocation refer to the
-[*Function Boundary ABI*](#function-boundary-abi) section. In the future, this
-could be automated with allocation hoisting (across function boundaries) or
-reference counting.
-
-One-Shot Bufferize leaks all memory and does not generate any buffer
-deallocations. The `-buffer-deallocation-pipeline` has to be run afterwards to
-insert the deallocation operations.
-
-## Ownership-based Buffer Deallocation
-
-Recommended compilation pipeline:
-```
-one-shot-bufferize
- | it's recommended to perform all bufferization here at latest,
- | <- any allocations inserted after this point have to be handled
- V manually
-expand-realloc
- V
-ownership-based-buffer-deallocation
- V
- canonicalize <- mostly for scf.if simplifications
- V
-buffer-deallocation-simplification
- V <- from this point onwards no tensor values are allowed
-lower-deallocations
- V
- CSE
- V
- canonicalize
-```
-
-One-Shot Bufferize does not deallocate any buffers that it allocates. This job
-is delegated to the
-[`-ownership-based-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-ownership-based-buffer-deallocation)
-pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
-`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
-implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
-This means, that there have to be [some rules](#function-boundary-abi) on how
-MemRefs are handled when being passed from one function to another. The rest of
-the pass revolves heavily around the `bufferization.dealloc` operation which is
-inserted at the end of each basic block with appropriate operands and should be
-optimized using the Buffer Deallocation Simplification pass
-(`--buffer-deallocation-simplification`) and the regular canonicalizer
-(`--canonicalize`). Lowering the result of the
-`-ownership-based-buffer-deallocation` pass directly using
-`--convert-bufferization-to-memref` without beforehand optimization is not
-recommended as it will lead to very inefficient code (the runtime-cost of
-`bufferization.dealloc` is `O(|memrefs|^2+|memref|*|retained|)`).
-
-### Function boundary ABI
-
-The Buffer Deallocation pass operates on the level of operations implementing
-the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
-also return them. To ensure compatibility among all functions (including
-external ones), some rules have to be enforced:
-* When a MemRef is passed as a function argument, ownership is never acquired.
- It is always the caller's responsibility to deallocate such MemRefs.
-* Returning a MemRef from a function always passes ownership to the caller,
- i.e., it is also the caller's responsibility to deallocate memrefs returned
- from a called function.
-* A function must not return a MemRef with the same allocated base buffer as
- one of its arguments (in this case a copy has to be created). Note that in
- this context two subviews of the same buffer that don't overlap are also
- considered to alias.
-
-For external functions (e.g., library functions written externally in C), the
-externally provided implementation has to adhere to these rules and they are
-just assumed by the buffer deallocation pass. Functions on which the
-deallocation pass is applied and the implementation is accessible are modified
-by the pass such that the ABI is respected (i.e., buffer copies are inserted as
-necessary).
-
-### Inserting `bufferization.dealloc` operations
-
-`bufferization.dealloc` operations are unconditionally inserted at the end of
-each basic block (just before the terminator). The majority of the pass is about
-finding the correct operands for this operation. There are three variadic
-operand lists to be populated, the first contains all MemRef values that may
-need to be deallocated, the second list contains their associated ownership
-values (of `i1` type), and the third list contains MemRef values that are still
-needed at a later point and should thus not be deallocated. This operation
-allows us to deal with any kind of aliasing behavior: it lowers to runtime
-aliasing checks when not enough information can be collected statically. When
-enough aliasing information is statically available, operands or the entire op
-may fold away.
-
-**Ownerships**
-
-To do so, we use a concept of ownership indicators of memrefs which materialize
-as an `i1` value for any SSA value of `memref` type, indicating whether the
-basic block in which it was materialized has ownership of this MemRef. Ideally,
-this is a constant `true` or `false`, but might also be a non-constant SSA
-value. To keep track of those ownership values without immediately materializing
-them (which might require insertion of `bufferization.clone` operations or
-operations checking for aliasing at runtime at positions where we don't actually
-need a materialized value), we use the `Ownership` class. This class represents
-the ownership in three states forming a lattice on a partial order:
-```
-forall X in SSA values. uninitialized < unique(X) < unknown
-forall X, Y in SSA values.
- unique(X) == unique(Y) iff X and Y always evaluate to the same value
- unique(X) != unique(Y) otherwise
-```
-Intuitively, the states have the following meaning:
-* Uninitialized: the ownership is not initialized yet, this is the default
- state; once an operation is finished processing the ownership of all
- operation results with MemRef type should not be uninitialized anymore.
-* Unique: there is a specific SSA value that can be queried to check ownership
- without materializing any additional IR
-* Unknown: no specific SSA value is available without materializing additional
- IR, typically this is because two ownerships in 'Unique' state would have to
- be merged manually (e.g., the result of an `arith.select` either has the
- ownership of the then or else case depending on the condition value,
- inserting another `arith.select` for the ownership values can perform the
- merge and provide a 'Unique' ownership for the result), however, in the
- general case this 'Unknown' state has to be assigned.
-
-Implied by the above partial order, the pass combines two ownerships in the
-following way:
-
-| Ownership 1 | Ownership 2 | Combined Ownership |
-|:--------------|:--------------|:-------------------|
-| uninitialized | uninitialized | uninitialized |
-| unique(X) | uninitialized | unique(X) |
-| unique(X) | unique(X) | unique(X) |
-| unique(X) | unique(Y) | unknown |
-| unknown | unique | unknown |
-| unknown | uninitialized | unknown |
-| <td colspan=3> + symmetric cases |
-
-**Collecting the list of MemRefs that potentially need to be deallocated**
-
-For a given block, the list of MemRefs that potentially need to be deallocated
-at the end of that block is computed by keeping track of all values for which
-the block potentially takes over ownership. This includes MemRefs provided as
-basic block arguments, interface handlers for operations like `memref.alloc` and
-`func.call`, but also liveness information in regions with multiple basic
-blocks. More concretely, it is computed by taking the MemRefs in the 'in' set
-of the liveness analysis of the current basic block B, appended by the MemRef
-block arguments and by the set of MemRefs allocated in B itself (determined by
-the interface handlers), then subtracted (also determined by the interface
-handlers) by the set of MemRefs deallocated in B.
-
-Note that we don't have to take the intersection of the liveness 'in' set with
-the 'out' set of the predecessor block because a value that is in the 'in' set
-must be defined in an ancestor block that dominates all direct predecessors and
-thus the 'in' set of this block is a subset of the 'out' sets of each
-predecessor.
-
-```
-memrefs = filter((liveIn(block) U
- allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
-```
-
-The list of conditions for the second variadic operands list of
-`bufferization.dealloc` is computed by querying the stored ownership value for
-each of the MemRefs collected as described above. The ownership state is updated
-by the interface handlers while processing the basic block.
-
-**Collecting the list of MemRefs to retain**
-
-Given a basic block B, the list of MemRefs that have to be retained can be
-different for each successor block S. For the two basic blocks B and S and the
-values passed via block arguments to the destination block S, we compute the
-list of MemRefs that have to be retained in B by taking the MemRefs in the
-successor operand list of the terminator and the MemRefs in the 'out' set of the
-liveness analysis for B intersected with the 'in' set of the destination block
-S.
-
-This list of retained values makes sure that we cannot run into use-after-free
-situations even if no aliasing information is present at compile-time.
-
-```
-toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
- liveIn(toBlock)), isMemRef)
-```
-
-### Supported interfaces
-
-The pass uses liveness analysis and a few interfaces:
-* `FunctionOpInterface`
-* `CallOpInterface`
-* `MemoryEffectOpInterface`
-* `RegionBranchOpInterface`
-* `RegionBranchTerminatorOpInterface`
-
-Due to insufficient information provided by the interface, it also special-cases
-on the `cf.cond_br` operation and makes some assumptions about operations
-implementing the `RegionBranchOpInterface` at the moment, but improving the
-interfaces would allow us to remove those dependencies in the future.
-
-### Limitations
-
-The Buffer Deallocation pass has some requirements and limitations on the input
-IR. These are checked in the beginning of the pass and errors are emitted
-accordingly:
-* The set of interfaces the pass operates on must be implemented (correctly).
- E.g., if there is an operation present with a nested region, but does not
- implement the `RegionBranchOpInterface`, an error is emitted because the
- pass cannot know the semantics of the nested region (and does not make any
- default assumptions on it).
-* No explicit control-flow loops are present. Currently, only loops using
- structural-control-flow are supported. However, this limitation could be
- lifted in the future.
-* Deallocation operations should not be present already. The pass should
- handle them correctly already (at least in most cases), but it's not
- supported yet due to insufficient testing.
-* Terminators must implement either `RegionBranchTerminatorOpInterface` or
- `BranchOpInterface`, but not both. Terminators with more than one successor
- are not supported (except `cf.cond_br`). This is not a fundamental
- limitation, but there is no use-case justifying the more complex
- implementation at the moment.
-
-### Example
-
-The following example contains a few interesting cases:
-* Basic block arguments are modified to also pass along the ownership
- indicator, but not for entry blocks, where the function boundary ABI
- is applied instead.
-* The result of `arith.select` initially has 'Unknown' assigned as ownership,
- but once the `bufferization.dealloc` operation is inserted it is put in the
- 'retained' list (since it has uses in a later basic block) and thus the
- 'Unknown' ownership can be replaced with a 'Unique' ownership using the
- corresponding result of the dealloc operation.
-* The `cf.cond_br` operation has more than one successor and thus has to
- insert two `bufferization.dealloc` operations (one for each successor).
- While they have the same list of MemRefs to deallocate (because they perform
- the deallocations for the same block), it must be taken into account that
- some MemRefs remain *live* for one branch but not the other (thus set
- intersection is performed on the *live-out* of the current block and the
- *live-in* of the target block). Also, `cf.cond_br` supports separate
- forwarding operands for each successor. To make sure that no MemRef is
- deallocated twice (because there are two `bufferization.dealloc` operations
- with the same MemRefs to deallocate), the condition operands are adjusted to
- take the branch condition into account. While a generic lowering for such
- terminator operations could be implemented, a specialized implementation can
- take all the semantics of this particular operation into account and thus
- generate a more efficient lowering.
-
-```mlir
-func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
- %alloc = memref.alloc() : memref<?xi8>
- %alloca = memref.alloca() : memref<?xi8>
- %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
- cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
-^bb1(%bbarg: memref<?xi8>):
- test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
- return
-}
-```
-
-After running `--ownership-based-buffer-deallocation`, it looks as follows:
-
-```mlir
-// Function boundary ABI: ownership of `%memref` will never be acquired.
-func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
- %false = arith.constant false
- %true = arith.constant true
-
- // The ownership of a MemRef defined by the `memref.alloc` operation is always
- // assigned to be 'true'.
- %alloc = memref.alloc() : memref<?xi8>
-
- // The ownership of a MemRef defined by the `memref.alloca` operation is
- // always assigned to be 'false'.
- %alloca = memref.alloca() : memref<?xi8>
-
- // The ownership of %select will be the join of the ownership of %alloc and
- // the ownership of %alloca, i.e., of %true and %false. Because the pass does
- // not know about the semantics of the `arith.select` operation (unless a
- // custom handler is implemented), the ownership join will be 'Unknown'. If
- // the materialized ownership indicator of %select is needed, either a clone
- // has to be created for which %true is assigned as ownership or the result
- // of a `bufferization.dealloc` where %select is in the retain list has to be
- // used.
- %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
-
- // We use `memref.extract_strided_metadata` to get the base memref since it is
- // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
- // already enforced for `bufferization.dealloc`
- %base_buffer_memref, ... = memref.extract_strided_metadata %memref
- : memref<?xi8> -> memref<i8>, index, index, index
- %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
- : memref<?xi8> -> memref<i8>, index, index, index
- %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
- : memref<?xi8> -> memref<i8>, index, index, index
-
- // The deallocation conditions need to be adjusted to incorporate the branch
- // condition. In this example, this requires only a single negation, but might
- // also require multiple arith.andi operations.
- %not_br_cond = arith.xori %true, %br_cond : i1
-
- // There are two dealloc operations inserted in this basic block, one per
- // successor. Both have the same list of MemRefs to deallocate and the
- // conditions only differ by the branch condition conjunct.
- // Note, however, that the retained list differs. Here, both contain the
- // %select value because it is used in both successors (since it's the same
- // block), but the value passed via block argument differs (%memref vs.
- // %alloc).
- %10:2 = bufferization.dealloc
- (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
- : memref<i8>, memref<i8>, memref<i8>)
- if (%false, %br_cond, %false)
- retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
-
- %11:2 = bufferization.dealloc
- (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
- : memref<i8>, memref<i8>, memref<i8>)
- if (%false, %not_br_cond, %false)
- retain (%memref, %select : memref<?xi8>, memref<?xi8>)
-
- // Because %select is used in ^bb1 without passing it via block argument, we
- // need to update it's ownership value here by merging the ownership values
- // returned by the dealloc operations
- %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
-
- // The terminator is modified to pass along the ownership indicator values
- // with each MemRef value.
- cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1),
- ^bb1(%memref, %11#0 : memref<?xi8>, i1)
-
-// All non-entry basic blocks are modified to have an additional i1 argument for
-// each MemRef value in the argument list.
-^bb1(%13: memref<?xi8>, %14: i1): // 2 preds: ^bb0, ^bb0
- test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>)
-
- %base_buffer_13, ... = memref.extract_strided_metadata %13
- : memref<?xi8> -> memref<i8>, index, index, index
- %base_buffer_select, ... = memref.ex...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/89196
More information about the Mlir-commits
mailing list