[Mlir-commits] [mlir] [mlir][bufferization] Add an ownership based buffer deallocation pass (PR #66337)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Sep 14 01:15:17 PDT 2023
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir
<details>
<summary>Changes</summary>
Add a new Buffer Deallocation pass with the intend to replace the old one. For now it is added as a separate pass alongside in order to allow downstream users to migrate over gradually. This new pass has the goal of inserting fewer clone operations and supporting additional use-cases. Please refer to the Buffer Deallocation section in the updated Bufferization.md file for more information on how this new pass works.
--
Patch is 188.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66337.diff
15 Files Affected:
- (modified) mlir/docs/Bufferization.md (+604)
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/BufferUtils.h (+8)
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h (+9)
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td (+144)
- (modified) mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp (+59)
- (modified) mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt (+2)
- (added) mlir/lib/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation.cpp (+1383)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir (+589)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-callop-interface.mlir (+113)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-existing-deallocs.mlir (+43)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-function-boundaries.mlir (+131)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-memoryeffect-interface.mlir (+124)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-region-branchop-interface.mlir (+695)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir (+21)
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/invalid-buffer-deallocation.mlir (+93)
<pre>
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index f03d7bb877c9c74..f64e94758c8eb28 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -224,6 +224,9 @@ dialect conversion-based bufferization.
## Buffer Deallocation
+**Important: this pass is deprecated, please use the ownership based buffer**
+**deallocation pass instead**
+
One-Shot Bufferize deallocates all buffers that it allocates. This is in
contrast to the dialect conversion-based bufferization that delegates this job
to the
@@ -300,6 +303,607 @@ One-Shot Bufferize can be configured to leak all memory and not generate any
buffer deallocations with `create-deallocs=0`. This can be useful for
compatibility with legacy code that has its own method of deallocating buffers.
+## Ownership-based Buffer Deallocation
+
+Recommended compilation pipeline:
+```
+one-shot-bufferize
+ | it's recommended to perform all bufferization here at latest,
+ | <- any allocations inserted after this point have to be handled
+ V manually
+expand-realloc
+ V
+buffer-deallocation
+ V
+ canonicalize <- mostly for scf.if simplifications
+ V
+buffer-deallocation-simplification
+ V <- from this point onwards no tensor values are allowed
+lower-deallocations
+ V
+ CSE
+ V
+ canonicalize
+```
+
+One-Shot Bufferize does not deallocate any buffers that it allocates. This job
+is delegated to the
+[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
+pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
+`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
+implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
+This means, that there have to be [some rules](#function-boundary-api) on how
+MemRefs are handled when being passed from one function to another. The rest of
+the pass revolves heavily around the `bufferization.dealloc` operation which is
+inserted at the end of each basic block with appropriate operands and should be
+optimized using the Buffer Deallocation Simplification pass
+(`--buffer-deallocation-simplification`) and the regular canonicalizer
+(`--canonicalize`). Lowering the result of the `-buffer-deallocation` pass
+directly using `--convert-bufferization-to-memref` without beforehand
+optimization is not recommended as it will lead to very inefficient code (the
+runtime-cost of `bufferization.dealloc` is
+`O(|memrefs|^2+|memref|*|retained|)`).
+
+### Function boundary ABI
+
+The Buffer Deallocation pass operates on the level of operations implementing
+the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
+also return them. To ensure compatibility among all functions (including
+external ones), some rules have to be enforced:
+* When a MemRef is passed as a function argument, ownership is never acquired.
+ It is always the caller's responsibility to deallocate such MemRefs.
+* Returning a MemRef from a function always passes ownership to the caller,
+ i.e., it is also the caller's responsibility to deallocate memrefs returned
+ from a called function.
+* A function must not return a MemRef with the same allocated base buffer as
+ one of its arguments (in this case a copy has to be created). Note that in
+ this context two subviews of the same buffer that don't overlap are also
+ considered to alias.
+
+For external functions (e.g., library functions written externally in C), the
+externally provided implementation has to adhere to these rules and they are
+just assumed by the buffer deallocation pass. Functions on which the
+deallocation pass is applied and the implementation is accessible are modified
+by the pass such that the ABI is respected (i.e., buffer copies are inserted as
+necessary).
+
+### Inserting `bufferization.dealloc` operations
+
+`bufferization.dealloc` operations are unconditionally inserted at the end of
+each basic block (just before the terminator). The majority of the pass is about
+finding the correct operands for this operation. There are three variadic
+operand lists to be populated, the first contains all MemRef values that may
+need to be deallocated, the second list contains their associated ownership
+values (of `i1` type), and the third list contains MemRef values that are still
+needed at a later point and should thus not be deallocated. This operation
+allows us to deal with any kind of aliasing behavior: it lowers to runtime
+aliasing checks when not enough information can be collected statically. When
+enough aliasing information is statically available, operands or the entire op
+may fold away.
+
+**Ownerships**
+
+To do so, we use a concept of ownership indicators of memrefs which materialize
+as an `i1` value for any SSA value of `memref` type, indicating whether the
+basic block in which it was materialized has ownership of this MemRef. Ideally,
+this is a constant `true` or `false`, but might also be a non-constant SSA
+value. To keep track of those ownership values without immediately materializing
+them (which might require insertion of `bufferization.clone` operations or
+operations checking for aliasing at runtime at positions where we don't actually
+need a materialized value), we use the `Ownership` class. This class represents
+the ownership in three states forming a lattice on a partial order:
+```
+forall X in SSA values. uninitialized < unique(X) < unknown
+forall X, Y in SSA values.
+ unique(X) == unique(Y) iff X and Y always evaluate to the same value
+ unique(X) != unique(Y) otherwise
+```
+Intuitively, the states have the following meaning:
+* Uninitialized: the ownership is not initialized yet, this is the default
+ state; once an operation is finished processing the ownership of all
+ operation results with MemRef type should not be uninitialized anymore.
+* Unique: there is a specific SSA value that can be queried to check ownership
+ without materializing any additional IR
+* Unknown: no specific SSA value is available without materializing additional
+ IR, typically this is because two ownerships in 'Unique' state would have to
+ be merged manually (e.g., the result of an `arith.select` either has the
+ ownership of the then or else case depending on the condition value,
+ inserting another `arith.select` for the ownership values can perform the
+ merge and provide a 'Unique' ownership for the result), however, in the
+ general case this 'Unknown' state has to be assigned.
+
+Implied by the above partial order, the pass combines two ownerships in the
+following way:
+
+| Ownership 1 | Ownership 2 | Combined Ownership |
+|:--------------|:--------------|:-------------------|
+| uninitialized | uninitialized | uninitialized |
+| unique(X) | uninitialized | unique(X) |
+| unique(X) | unique(X) | unique(X) |
+| unique(X) | unique(Y) | unknown |
+| unknown | unique | unknown |
+| unknown | uninitialized | unknown |
+| <td colspan=3> + symmetric cases |
+
+**Collecting the list of MemRefs that potentially need to be deallocated**
+
+For a given block, the list of MemRefs that potentially need to be deallocated
+at the end of that block is computed by keeping track of all values for which
+the block potentially takes over ownership. This includes MemRefs provided as
+basic block arguments, interface handlers for operations like `memref.alloc` and
+`func.call`, but also liveness information in regions with multiple basic
+blocks. More concretely, it is computed by taking the MemRefs in the 'in' set
+of the liveness analysis of the current basic block B, appended by the MemRef
+block arguments and by the set of MemRefs allocated in B itself (determined by
+the interface handlers), then subtracted (also determined by the interface
+handlers) by the set of MemRefs deallocated in B.
+
+Note that we don't have to take the intersection of the liveness 'in' set with
+the 'out' set of the predecessor block because a value that is in the 'in' set
+must be defined in an ancestor block that dominates all direct predecessors and
+thus the 'in' set of this block is a subset of the 'out' sets of each
+predecessor.
+
+```
+memrefs = filter((liveIn(block) U
+ allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
+```
+
+The list of conditions for the second variadic operands list of
+`bufferization.dealloc` is computed by querying the stored ownership value for
+each of the MemRefs collected as described above. The ownership state is updated
+by the interface handlers while processing the basic block.
+
+**Collecting the list of MemRefs to retain**
+
+Given a basic block B, the list of MemRefs that have to be retained can be
+different for each successor block S. For the two basic blocks B and S and the
+values passed via block arguments to the destination block S, we compute the
+list of MemRefs that have to be retained in B by taking the MemRefs in the
+successor operand list of the terminator and the MemRefs in the 'out' set of the
+liveness analysis for B intersected with the 'in' set of the destination block
+S.
+
+This list of retained values makes sure that we cannot run into use-after-free
+situations even if no aliasing information is present at compile-time.
+
+```
+toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
+ liveIn(toBlock)), isMemRef)
+```
+
+### Supported interfaces
+
+The pass uses liveness analysis and a few interfaces:
+* `FunctionOpInterface`
+* `CallOpInterface`
+* `MemoryEffectOpInterface`
+* `RegionBranchOpInterface`
+* `RegionBranchTerminatorOpInterface`
+
+Due to insufficient information provided by the interface, it also special-cases
+on the `cf.cond_br` operation and makes some assumptions about operations
+implementing the `RegionBranchOpInterface` at the moment, but improving the
+interfaces would allow us to remove those dependencies in the future.
+
+### Limitations
+
+The Buffer Deallocation pass has some requirements and limitations on the input
+IR. These are checked in the beginning of the pass and errors are emitted
+accordingly:
+* The set of interfaces the pass operates on must be implemented (correctly).
+ E.g., if there is an operation present with a nested region, but does not
+ implement the `RegionBranchOpInterface`, an error is emitted because the
+ pass cannot know the semantics of the nested region (and does not make any
+ default assumptions on it).
+* No explicit control-flow loops are present. Currently, only loops using
+ structural-control-flow are supported. However, this limitation could be
+ lifted in the future.
+* Deallocation operations should not be present already. The pass should
+ handle them correctly already (at least in most cases), but it's not
+ supported yet due to insufficient testing.
+* Terminators must implement either `RegionBranchTerminatorOpInterface` or
+ `BranchOpInterface`, but not both. Terminators with more than one successor
+ are not supported (except `cf.cond_br`). This is not a fundamental
+ limitation, but there is no use-case justifying the more complex
+ implementation at the moment.
+
+### Example
+
+The following example contains a few interesting cases:
+* Basic block arguments are modified to also pass along the ownership
+ indicator, but not for entry bocks of non-private functions (assuming the
+ `private-function-dynamic-ownership` pass option is disabled) where the
+ function boundary ABI is applied instead. "Private" in this context refers
+ to functions that cannot be called externally.
+* The result of `arith.select` initially has 'Unknown' assigned as ownership,
+ but once the `bufferization.dealloc` operation is inserted it is put in the
+ 'retained' list (since it has uses in a later basic block) and thus the
+ 'Unknown' ownership can be replaced with a 'Unique' ownership using the
+ corresponding result of the dealloc operation.
+* The `cf.cond_br` operation has more than one successor and thus has to
+ insert two `bufferization.dealloc` operations (one for each successor).
+ While they have the same list of MemRefs to deallocate (because they perform
+ the deallocations for the same block), it must be taken into account that
+ some MemRefs remain *live* for one branch but not the other (thus set
+ intersection is performed on the *live-out* of the current block and the
+ *live-in* of the target block). Also, `cf.cond_br` supports separate
+ forwarding operands for each successor. To make sure that no MemRef is
+ deallocated twice (because there are two `bufferization.dealloc` operations
+ with the same MemRefs to deallocate), the condition operands are adjusted to
+ take the branch condition into account. While a generic lowering for such
+ terminator operations could be implemented, a specialized implementation can
+ take all the semantics of this particular operation into account and thus
+ generate a more efficient lowering.
+
+```mlir
+func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
+ %alloc = memref.alloc() : memref<?xi8>
+ %alloca = memref.alloca() : memref<?xi8>
+ %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
+ cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
+^bb1(%bbarg: memref<?xi8>):
+ test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
+ return
+}
+```
+
+After running `--buffer-deallocation`, it looks as follows:
+
+```mlir
+// Since this is not a private function, the signature will not be modified even
+// when private-function-dynamic-ownership is enabled. Instead the function
+// boundary ABI has to be applied which means that ownership of `%memref` will
+// never be acquired.
+func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
+ %false = arith.constant false
+ %true = arith.constant true
+
+ // The ownership of a MemRef defined by the `memref.alloc` operation is always
+ // assigned to be 'true'.
+ %alloc = memref.alloc() : memref<?xi8>
+
+ // The ownership of a MemRef defined by the `memref.alloca` operation is
+ // always assigned to be 'false'.
+ %alloca = memref.alloca() : memref<?xi8>
+
+ // The ownership of %select will be the join of the ownership of %alloc and
+ // the ownership of %alloca, i.e., of %true and %false. Because the pass does
+ // not know about the semantics of the `arith.select` operation (unless a
+ // custom handler is implemented), the ownership join will be 'Unknown'. If
+ // the materialized ownership indicator of %select is needed, either a clone
+ // has to be created for which %true is assigned as ownership or the result
+ // of a `bufferization.dealloc` where %select is in the retain list has to be
+ // used.
+ %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
+
+ // We use `memref.extract_strided_metadata` to get the base memref since it is
+ // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
+ // already enforced for `bufferization.dealloc`
+ %base_buffer_memref, ... = memref.extract_strided_metadata %memref
+ : memref<?xi8> -> memref<i8>, index, index, index
+ %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
+ : memref<?xi8> -> memref<i8>, index, index, index
+ %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
+ : memref<?xi8> -> memref<i8>, index, index, index
+
+ // The deallocation conditions need to be adjusted to incorporate the branch
+ // condition. In this example, this requires only a single negation, but might
+ // also require multiple arith.andi operations.
+ %not_br_cond = arith.xori %true, %br_cond : i1
+
+ // There are two dealloc operations inserted in this basic block, one per
+ // successor. Both have the same list of MemRefs to deallocate and the
+ // conditions only differ by the branch condition conjunct.
+ // Note, however, that the retained list differs. Here, both contain the
+ // %select value because it is used in both successors (since it's the same
+ // block), but the value passed via block argument differs (%memref vs.
+ // %alloc).
+ %10:2 = bufferization.dealloc
+ (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+ : memref<i8>, memref<i8>, memref<i8>)
+ if (%false, %br_cond, %false)
+ retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
+
+ %11:2 = bufferization.dealloc
+ (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+ : memref<i8>, memref<i8>, memref<i8>)
+ if (%false, %not_br_cond, %false)
+ retain (%memref, %select : memref<?xi8>, memref<?xi8>)
+
+ // Because %select is used in ^bb1 without passing it via block argument, we
+ // need to update it's ownership value here by merging the ownership values
+ // returned by the dealloc operations
+ %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
+
+ // The terminator is modified to pass along the ownership indicator values
+ // with each MemRef value.
+ cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1),
+ ^bb1(%memref, %11#0 : memref<?xi8>, i1)
+
+// All non-entry basic blocks are modified to have an additional i1 argument for
+// each MemRef value in the argument list.
+^bb1(%13: memref<?xi8>, %14: i1): // 2 preds: ^bb0, ^bb0
+ test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>)
+
+ %base_buffer_13, ... = memref.extract_strided_metadata %13
+ : memref<?xi8> -> memref<i8>...
<truncated>
</pre>
</details>
https://github.com/llvm/llvm-project/pull/66337
More information about the Mlir-commits
mailing list