[Mlir-commits] [mlir] [mlir][bufferization] Add an ownership based buffer deallocation pass (PR #66337)

llvmlistbot at llvm.org llvmlistbot at llvm.org
Thu Sep 14 01:15:17 PDT 2023


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-mlir
            
<details>
<summary>Changes</summary>
Add a new Buffer Deallocation pass with the intend to replace the old one. For now it is added as a separate pass alongside in order to allow downstream users to migrate over gradually. This new pass has the goal of inserting fewer clone operations and supporting additional use-cases. Please refer to the Buffer Deallocation section in the updated Bufferization.md file for more information on how this new pass works.
--

Patch is 188.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66337.diff

15 Files Affected:

- (modified) mlir/docs/Bufferization.md (+604) 
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/BufferUtils.h (+8) 
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h (+9) 
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td (+144) 
- (modified) mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp (+59) 
- (modified) mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt (+2) 
- (added) mlir/lib/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation.cpp (+1383) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir (+589) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-callop-interface.mlir (+113) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-existing-deallocs.mlir (+43) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-function-boundaries.mlir (+131) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-memoryeffect-interface.mlir (+124) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-region-branchop-interface.mlir (+695) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir (+21) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/invalid-buffer-deallocation.mlir (+93) 


<pre>
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index f03d7bb877c9c74..f64e94758c8eb28 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -224,6 +224,9 @@ dialect conversion-based bufferization.
 
 ## Buffer Deallocation
 
+**Important: this pass is deprecated, please use the ownership based buffer**
+**deallocation pass instead**
+
 One-Shot Bufferize deallocates all buffers that it allocates. This is in
 contrast to the dialect conversion-based bufferization that delegates this job
 to the
@@ -300,6 +303,607 @@ One-Shot Bufferize can be configured to leak all memory and not generate any
 buffer deallocations with `create-deallocs=0`. This can be useful for
 compatibility with legacy code that has its own method of deallocating buffers.
 
+## Ownership-based Buffer Deallocation
+
+Recommended compilation pipeline:
+```
+one-shot-bufferize
+       |          it&#x27;s recommended to perform all bufferization here at latest,
+       |       &lt;- any allocations inserted after this point have to be handled
+       V          manually
+expand-realloc
+       V
+buffer-deallocation
+       V
+  canonicalize &lt;- mostly for scf.if simplifications
+       V
+buffer-deallocation-simplification
+       V       &lt;- from this point onwards no tensor values are allowed
+lower-deallocations
+       V
+      CSE
+       V
+  canonicalize
+```
+
+One-Shot Bufferize does not deallocate any buffers that it allocates. This job
+is delegated to the
+[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
+pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
+`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
+implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
+This means, that there have to be [some rules](#function-boundary-api) on how
+MemRefs are handled when being passed from one function to another. The rest of
+the pass revolves heavily around the `bufferization.dealloc` operation which is
+inserted at the end of each basic block with appropriate operands and should be
+optimized using the Buffer Deallocation Simplification pass
+(`--buffer-deallocation-simplification`) and the regular canonicalizer
+(`--canonicalize`). Lowering the result of the `-buffer-deallocation` pass
+directly using `--convert-bufferization-to-memref` without beforehand
+optimization is not recommended as it will lead to very inefficient code (the
+runtime-cost of `bufferization.dealloc` is
+`O(|memrefs|^2+|memref|*|retained|)`).
+
+### Function boundary ABI
+
+The Buffer Deallocation pass operates on the level of operations implementing
+the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
+also return them. To ensure compatibility among all functions (including
+external ones), some rules have to be enforced:
+*   When a MemRef is passed as a function argument, ownership is never acquired.
+    It is always the caller&#x27;s responsibility to deallocate such MemRefs.
+*   Returning a MemRef from a function always passes ownership to the caller,
+    i.e., it is also the caller&#x27;s responsibility to deallocate memrefs returned
+    from a called function.
+*   A function must not return a MemRef with the same allocated base buffer as
+    one of its arguments (in this case a copy has to be created). Note that in
+    this context two subviews of the same buffer that don&#x27;t overlap are also
+    considered to alias.
+
+For external functions (e.g., library functions written externally in C), the
+externally provided implementation has to adhere to these rules and they are
+just assumed by the buffer deallocation pass. Functions on which the
+deallocation pass is applied and the implementation is accessible are modified
+by the pass such that the ABI is respected (i.e., buffer copies are inserted as
+necessary).
+
+### Inserting `bufferization.dealloc` operations
+
+`bufferization.dealloc` operations are unconditionally inserted at the end of
+each basic block (just before the terminator). The majority of the pass is about
+finding the correct operands for this operation. There are three variadic
+operand lists to be populated, the first contains all MemRef values that may
+need to be deallocated, the second list contains their associated ownership
+values (of `i1` type), and the third list contains MemRef values that are still
+needed at a later point and should thus not be deallocated. This operation
+allows us to deal with any kind of aliasing behavior: it lowers to runtime
+aliasing checks when not enough information can be collected statically. When
+enough aliasing information is statically available, operands or the entire op
+may fold away.
+
+**Ownerships**
+
+To do so, we use a concept of ownership indicators of memrefs which materialize
+as an `i1` value for any SSA value of `memref` type, indicating whether the
+basic block in which it was materialized has ownership of this MemRef. Ideally,
+this is a constant `true` or `false`, but might also be a non-constant SSA
+value. To keep track of those ownership values without immediately materializing
+them (which might require insertion of `bufferization.clone` operations or
+operations checking for aliasing at runtime at positions where we don&#x27;t actually
+need a materialized value), we use the `Ownership` class. This class represents
+the ownership in three states forming a lattice on a partial order:
+```
+forall X in SSA values. uninitialized &lt; unique(X) &lt; unknown
+forall X, Y in SSA values.
+  unique(X) == unique(Y) iff X and Y always evaluate to the same value
+  unique(X) != unique(Y) otherwise
+```
+Intuitively, the states have the following meaning:
+*   Uninitialized: the ownership is not initialized yet, this is the default
+    state; once an operation is finished processing the ownership of all
+    operation results with MemRef type should not be uninitialized anymore.
+*   Unique: there is a specific SSA value that can be queried to check ownership
+    without materializing any additional IR
+*   Unknown: no specific SSA value is available without materializing additional
+    IR, typically this is because two ownerships in &#x27;Unique&#x27; state would have to
+    be merged manually (e.g., the result of an `arith.select` either has the
+    ownership of the then or else case depending on the condition value,
+    inserting another `arith.select` for the ownership values can perform the
+    merge and provide a &#x27;Unique&#x27; ownership for the result), however, in the
+    general case this &#x27;Unknown&#x27; state has to be assigned.
+
+Implied by the above partial order, the pass combines two ownerships in the
+following way:
+
+| Ownership 1   | Ownership 2   | Combined Ownership |
+|:--------------|:--------------|:-------------------|
+| uninitialized | uninitialized | uninitialized      |
+| unique(X)     | uninitialized | unique(X)          |
+| unique(X)     | unique(X)     | unique(X)          |
+| unique(X)     | unique(Y)     | unknown            |
+| unknown       | unique        | unknown            |
+| unknown       | uninitialized | unknown            |
+| &lt;td colspan=3&gt; + symmetric cases                   |
+
+**Collecting the list of MemRefs that potentially need to be deallocated**
+
+For a given block, the list of MemRefs that potentially need to be deallocated
+at the end of that block is computed by keeping track of all values for which
+the block potentially takes over ownership. This includes MemRefs provided as
+basic block arguments, interface handlers for operations like `memref.alloc` and
+`func.call`, but also liveness information in regions with multiple basic
+blocks.  More concretely, it is computed by taking the MemRefs in the &#x27;in&#x27; set
+of the liveness analysis of the current basic block B, appended by the MemRef
+block arguments and by the set of MemRefs allocated in B itself (determined by
+the interface handlers), then subtracted (also determined by the interface
+handlers) by the set of MemRefs deallocated in B.
+
+Note that we don&#x27;t have to take the intersection of the liveness &#x27;in&#x27; set with
+the &#x27;out&#x27; set of the predecessor block because a value that is in the &#x27;in&#x27; set
+must be defined in an ancestor block that dominates all direct predecessors and
+thus the &#x27;in&#x27; set of this block is a subset of the &#x27;out&#x27; sets of each
+predecessor.
+
+```
+memrefs = filter((liveIn(block) U
+  allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
+```
+
+The list of conditions for the second variadic operands list of
+`bufferization.dealloc` is computed by querying the stored ownership value for
+each of the MemRefs collected as described above. The ownership state is updated
+by the interface handlers while processing the basic block.
+
+**Collecting the list of MemRefs to retain**
+
+Given a basic block B, the list of MemRefs that have to be retained can be
+different for each successor block S.  For the two basic blocks B and S and the
+values passed via block arguments to the destination block S, we compute the
+list of MemRefs that have to be retained in B by taking the MemRefs in the
+successor operand list of the terminator and the MemRefs in the &#x27;out&#x27; set of the
+liveness analysis for B intersected with the &#x27;in&#x27; set of the destination block
+S.
+
+This list of retained values makes sure that we cannot run into use-after-free
+situations even if no aliasing information is present at compile-time.
+
+```
+toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
+  liveIn(toBlock)), isMemRef)
+```
+
+### Supported interfaces
+
+The pass uses liveness analysis and a few interfaces:
+*   `FunctionOpInterface`
+*   `CallOpInterface`
+*   `MemoryEffectOpInterface`
+*   `RegionBranchOpInterface`
+*   `RegionBranchTerminatorOpInterface`
+
+Due to insufficient information provided by the interface, it also special-cases
+on the `cf.cond_br` operation and makes some assumptions about operations
+implementing the `RegionBranchOpInterface` at the moment, but improving the
+interfaces would allow us to remove those dependencies in the future.
+
+### Limitations
+
+The Buffer Deallocation pass has some requirements and limitations on the input
+IR. These are checked in the beginning of the pass and errors are emitted
+accordingly:
+*   The set of interfaces the pass operates on must be implemented (correctly).
+    E.g., if there is an operation present with a nested region, but does not
+    implement the `RegionBranchOpInterface`, an error is emitted because the
+    pass cannot know the semantics of the nested region (and does not make any
+    default assumptions on it).
+*   No explicit control-flow loops are present. Currently, only loops using
+    structural-control-flow are supported.  However, this limitation could be
+    lifted in the future.
+*   Deallocation operations should not be present already. The pass should
+    handle them correctly already (at least in most cases), but it&#x27;s not
+    supported yet due to insufficient testing.
+*   Terminators must implement either `RegionBranchTerminatorOpInterface` or
+    `BranchOpInterface`, but not both. Terminators with more than one successor
+    are not supported (except `cf.cond_br`). This is not a fundamental
+    limitation, but there is no use-case justifying the more complex
+    implementation at the moment.
+
+### Example
+
+The following example contains a few interesting cases:
+*   Basic block arguments are modified to also pass along the ownership
+    indicator, but not for entry bocks of non-private functions (assuming the
+    `private-function-dynamic-ownership` pass option is disabled) where the
+    function boundary ABI is applied instead. &quot;Private&quot; in this context refers
+    to functions that cannot be called externally.
+*   The result of `arith.select` initially has &#x27;Unknown&#x27; assigned as ownership,
+    but once the `bufferization.dealloc` operation is inserted it is put in the
+    &#x27;retained&#x27; list (since it has uses in a later basic block) and thus the
+    &#x27;Unknown&#x27; ownership can be replaced with a &#x27;Unique&#x27; ownership using the
+    corresponding result of the dealloc operation.
+*   The `cf.cond_br` operation has more than one successor and thus has to
+    insert two `bufferization.dealloc` operations (one for each successor).
+    While they have the same list of MemRefs to deallocate (because they perform
+    the deallocations for the same block), it must be taken into account that
+    some MemRefs remain *live* for one branch but not the other (thus set
+    intersection is performed on the *live-out* of the current block and the
+    *live-in* of the target block). Also, `cf.cond_br` supports separate
+    forwarding operands for each successor. To make sure that no MemRef is
+    deallocated twice (because there are two `bufferization.dealloc` operations
+    with the same MemRefs to deallocate), the condition operands are adjusted to
+    take the branch condition into account. While a generic lowering for such
+    terminator operations could be implemented, a specialized implementation can
+    take all the semantics of this particular operation into account and thus
+    generate a more efficient lowering.
+
+```mlir
+func.func @example(%memref: memref&lt;?xi8&gt;, %select_cond: i1, %br_cond: i1) {
+  %alloc = memref.alloc() : memref&lt;?xi8&gt;
+  %alloca = memref.alloca() : memref&lt;?xi8&gt;
+  %select = arith.select %select_cond, %alloc, %alloca : memref&lt;?xi8&gt;
+  cf.cond_br %br_cond, ^bb1(%alloc : memref&lt;?xi8&gt;), ^bb1(%memref : memref&lt;?xi8&gt;)
+^bb1(%bbarg: memref&lt;?xi8&gt;):
+  test.copy(%bbarg, %select) : (memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+  return
+}
+```
+
+After running `--buffer-deallocation`, it looks as follows:
+
+```mlir
+// Since this is not a private function, the signature will not be modified even
+// when private-function-dynamic-ownership is enabled. Instead the function
+// boundary ABI has to be applied which means that ownership of `%memref` will
+// never be acquired.
+func.func @example(%memref: memref&lt;?xi8&gt;, %select_cond: i1, %br_cond: i1) {
+  %false = arith.constant false
+  %true = arith.constant true
+
+  // The ownership of a MemRef defined by the `memref.alloc` operation is always
+  // assigned to be &#x27;true&#x27;.
+  %alloc = memref.alloc() : memref&lt;?xi8&gt;
+
+  // The ownership of a MemRef defined by the `memref.alloca` operation is
+  // always assigned to be &#x27;false&#x27;.
+  %alloca = memref.alloca() : memref&lt;?xi8&gt;
+
+  // The ownership of %select will be the join of the ownership of %alloc and
+  // the ownership of %alloca, i.e., of %true and %false. Because the pass does
+  // not know about the semantics of the `arith.select` operation (unless a
+  // custom handler is implemented), the ownership join will be &#x27;Unknown&#x27;. If
+  // the materialized ownership indicator of %select is needed, either a clone
+  // has to be created for which %true is assigned as ownership or the result
+  // of a `bufferization.dealloc` where %select is in the retain list has to be
+  // used.
+  %select = arith.select %select_cond, %alloc, %alloca : memref&lt;?xi8&gt;
+
+  // We use `memref.extract_strided_metadata` to get the base memref since it is
+  // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
+  // already enforced for `bufferization.dealloc`
+  %base_buffer_memref, ... = memref.extract_strided_metadata %memref
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+  %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+  %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+
+  // The deallocation conditions need to be adjusted to incorporate the branch
+  // condition. In this example, this requires only a single negation, but might
+  // also require multiple arith.andi operations.
+  %not_br_cond = arith.xori %true, %br_cond : i1
+
+  // There are two dealloc operations inserted in this basic block, one per
+  // successor. Both have the same list of MemRefs to deallocate and the
+  // conditions only differ by the branch condition conjunct.
+  // Note, however, that the retained list differs. Here, both contain the
+  // %select value because it is used in both successors (since it&#x27;s the same
+  // block), but the value passed via block argument differs (%memref vs.
+  // %alloc).
+  %10:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref&lt;i8&gt;, memref&lt;i8&gt;, memref&lt;i8&gt;)
+        if (%false, %br_cond, %false)
+    retain (%alloc, %select : memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+
+  %11:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref&lt;i8&gt;, memref&lt;i8&gt;, memref&lt;i8&gt;)
+        if (%false, %not_br_cond, %false)
+    retain (%memref, %select : memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+  
+  // Because %select is used in ^bb1 without passing it via block argument, we
+  // need to update it&#x27;s ownership value here by merging the ownership values
+  // returned by the dealloc operations
+  %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
+
+  // The terminator is modified to pass along the ownership indicator values
+  // with each MemRef value.
+  cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref&lt;?xi8&gt;, i1),
+                       ^bb1(%memref, %11#0 : memref&lt;?xi8&gt;, i1)
+
+// All non-entry basic blocks are modified to have an additional i1 argument for
+// each MemRef value in the argument list.
+^bb1(%13: memref&lt;?xi8&gt;, %14: i1):  // 2 preds: ^bb0, ^bb0
+  test.copy(%13, %select) : (memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+
+  %base_buffer_13, ... = memref.extract_strided_metadata %13
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt...
<truncated>
</pre>
</details>


https://github.com/llvm/llvm-project/pull/66337


More information about the Mlir-commits mailing list