[llvm] [mlir][bufferization][NFC] Introduce BufferDeallocationOpInterface (PR #66349)

via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 14 02:34:12 PDT 2023


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-mlir-cf
            
<details>
<summary>Changes</summary>
This new interface allows operations to implement custom handling of ownership
values and insertion of dealloc operations which is useful when an op cannot
implement the interfaces supported by default by the buffer deallocation pass
(e.g., because they are not exactly compatible or because there are some
additional semantics to it that would render the default implementations in
buffer deallocation invalid, or because no interfaces exist for this kind of
behavior and it's not worth introducing one plus a default implementation in
buffer deallocation). Additionally, it can also be used to provide more
efficient handling for a specific op than the interface based default
implementations can.

Already reviewed in https://reviews.llvm.org/D158756

Depends on #66337
--

Patch is 203.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66349.diff

23 Files Affected:

- (modified) mlir/docs/Bufferization.md (+604) 
- (added) mlir/include/mlir/Dialect/Bufferization/IR/BufferDeallocationOpInterface.h (+217) 
- (added) mlir/include/mlir/Dialect/Bufferization/IR/BufferDeallocationOpInterface.td (+46) 
- (modified) mlir/include/mlir/Dialect/Bufferization/IR/CMakeLists.txt (+1) 
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h (+9) 
- (modified) mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td (+144) 
- (added) mlir/include/mlir/Dialect/ControlFlow/Transforms/BufferDeallocationOpInterfaceImpl.h (+22) 
- (modified) mlir/include/mlir/InitAllDialects.h (+2) 
- (added) mlir/lib/Dialect/Bufferization/IR/BufferDeallocationOpInterface.cpp (+274) 
- (modified) mlir/lib/Dialect/Bufferization/IR/CMakeLists.txt (+1) 
- (modified) mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt (+1) 
- (added) mlir/lib/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation.cpp (+996) 
- (added) mlir/lib/Dialect/ControlFlow/Transforms/BufferDeallocationOpInterfaceImpl.cpp (+163) 
- (modified) mlir/lib/Dialect/ControlFlow/Transforms/CMakeLists.txt (+2-1) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir (+589) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-callop-interface.mlir (+113) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-existing-deallocs.mlir (+43) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-function-boundaries.mlir (+131) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-memoryeffect-interface.mlir (+124) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-region-branchop-interface.mlir (+695) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-subviews.mlir (+21) 
- (added) mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/invalid-buffer-deallocation.mlir (+93) 
- (modified) utils/bazel/llvm-project-overlay/mlir/BUILD.bazel (+4) 


<pre>
diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index f03d7bb877c9c74..f64e94758c8eb28 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -224,6 +224,9 @@ dialect conversion-based bufferization.
 
 ## Buffer Deallocation
 
+**Important: this pass is deprecated, please use the ownership based buffer**
+**deallocation pass instead**
+
 One-Shot Bufferize deallocates all buffers that it allocates. This is in
 contrast to the dialect conversion-based bufferization that delegates this job
 to the
@@ -300,6 +303,607 @@ One-Shot Bufferize can be configured to leak all memory and not generate any
 buffer deallocations with `create-deallocs=0`. This can be useful for
 compatibility with legacy code that has its own method of deallocating buffers.
 
+## Ownership-based Buffer Deallocation
+
+Recommended compilation pipeline:
+```
+one-shot-bufferize
+       |          it&#x27;s recommended to perform all bufferization here at latest,
+       |       &lt;- any allocations inserted after this point have to be handled
+       V          manually
+expand-realloc
+       V
+buffer-deallocation
+       V
+  canonicalize &lt;- mostly for scf.if simplifications
+       V
+buffer-deallocation-simplification
+       V       &lt;- from this point onwards no tensor values are allowed
+lower-deallocations
+       V
+      CSE
+       V
+  canonicalize
+```
+
+One-Shot Bufferize does not deallocate any buffers that it allocates. This job
+is delegated to the
+[`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program)
+pass, i.e., after running One-Shot Bufferize, the result IR may have a number of
+`memref.alloc` ops, but no `memref.dealloc` ops. This pass processes operations
+implementing `FunctionOpInterface` one-by-one without analysing the call-graph.
+This means, that there have to be [some rules](#function-boundary-api) on how
+MemRefs are handled when being passed from one function to another. The rest of
+the pass revolves heavily around the `bufferization.dealloc` operation which is
+inserted at the end of each basic block with appropriate operands and should be
+optimized using the Buffer Deallocation Simplification pass
+(`--buffer-deallocation-simplification`) and the regular canonicalizer
+(`--canonicalize`). Lowering the result of the `-buffer-deallocation` pass
+directly using `--convert-bufferization-to-memref` without beforehand
+optimization is not recommended as it will lead to very inefficient code (the
+runtime-cost of `bufferization.dealloc` is
+`O(|memrefs|^2+|memref|*|retained|)`).
+
+### Function boundary ABI
+
+The Buffer Deallocation pass operates on the level of operations implementing
+the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
+also return them. To ensure compatibility among all functions (including
+external ones), some rules have to be enforced:
+*   When a MemRef is passed as a function argument, ownership is never acquired.
+    It is always the caller&#x27;s responsibility to deallocate such MemRefs.
+*   Returning a MemRef from a function always passes ownership to the caller,
+    i.e., it is also the caller&#x27;s responsibility to deallocate memrefs returned
+    from a called function.
+*   A function must not return a MemRef with the same allocated base buffer as
+    one of its arguments (in this case a copy has to be created). Note that in
+    this context two subviews of the same buffer that don&#x27;t overlap are also
+    considered to alias.
+
+For external functions (e.g., library functions written externally in C), the
+externally provided implementation has to adhere to these rules and they are
+just assumed by the buffer deallocation pass. Functions on which the
+deallocation pass is applied and the implementation is accessible are modified
+by the pass such that the ABI is respected (i.e., buffer copies are inserted as
+necessary).
+
+### Inserting `bufferization.dealloc` operations
+
+`bufferization.dealloc` operations are unconditionally inserted at the end of
+each basic block (just before the terminator). The majority of the pass is about
+finding the correct operands for this operation. There are three variadic
+operand lists to be populated, the first contains all MemRef values that may
+need to be deallocated, the second list contains their associated ownership
+values (of `i1` type), and the third list contains MemRef values that are still
+needed at a later point and should thus not be deallocated. This operation
+allows us to deal with any kind of aliasing behavior: it lowers to runtime
+aliasing checks when not enough information can be collected statically. When
+enough aliasing information is statically available, operands or the entire op
+may fold away.
+
+**Ownerships**
+
+To do so, we use a concept of ownership indicators of memrefs which materialize
+as an `i1` value for any SSA value of `memref` type, indicating whether the
+basic block in which it was materialized has ownership of this MemRef. Ideally,
+this is a constant `true` or `false`, but might also be a non-constant SSA
+value. To keep track of those ownership values without immediately materializing
+them (which might require insertion of `bufferization.clone` operations or
+operations checking for aliasing at runtime at positions where we don&#x27;t actually
+need a materialized value), we use the `Ownership` class. This class represents
+the ownership in three states forming a lattice on a partial order:
+```
+forall X in SSA values. uninitialized &lt; unique(X) &lt; unknown
+forall X, Y in SSA values.
+  unique(X) == unique(Y) iff X and Y always evaluate to the same value
+  unique(X) != unique(Y) otherwise
+```
+Intuitively, the states have the following meaning:
+*   Uninitialized: the ownership is not initialized yet, this is the default
+    state; once an operation is finished processing the ownership of all
+    operation results with MemRef type should not be uninitialized anymore.
+*   Unique: there is a specific SSA value that can be queried to check ownership
+    without materializing any additional IR
+*   Unknown: no specific SSA value is available without materializing additional
+    IR, typically this is because two ownerships in &#x27;Unique&#x27; state would have to
+    be merged manually (e.g., the result of an `arith.select` either has the
+    ownership of the then or else case depending on the condition value,
+    inserting another `arith.select` for the ownership values can perform the
+    merge and provide a &#x27;Unique&#x27; ownership for the result), however, in the
+    general case this &#x27;Unknown&#x27; state has to be assigned.
+
+Implied by the above partial order, the pass combines two ownerships in the
+following way:
+
+| Ownership 1   | Ownership 2   | Combined Ownership |
+|:--------------|:--------------|:-------------------|
+| uninitialized | uninitialized | uninitialized      |
+| unique(X)     | uninitialized | unique(X)          |
+| unique(X)     | unique(X)     | unique(X)          |
+| unique(X)     | unique(Y)     | unknown            |
+| unknown       | unique        | unknown            |
+| unknown       | uninitialized | unknown            |
+| &lt;td colspan=3&gt; + symmetric cases                   |
+
+**Collecting the list of MemRefs that potentially need to be deallocated**
+
+For a given block, the list of MemRefs that potentially need to be deallocated
+at the end of that block is computed by keeping track of all values for which
+the block potentially takes over ownership. This includes MemRefs provided as
+basic block arguments, interface handlers for operations like `memref.alloc` and
+`func.call`, but also liveness information in regions with multiple basic
+blocks.  More concretely, it is computed by taking the MemRefs in the &#x27;in&#x27; set
+of the liveness analysis of the current basic block B, appended by the MemRef
+block arguments and by the set of MemRefs allocated in B itself (determined by
+the interface handlers), then subtracted (also determined by the interface
+handlers) by the set of MemRefs deallocated in B.
+
+Note that we don&#x27;t have to take the intersection of the liveness &#x27;in&#x27; set with
+the &#x27;out&#x27; set of the predecessor block because a value that is in the &#x27;in&#x27; set
+must be defined in an ancestor block that dominates all direct predecessors and
+thus the &#x27;in&#x27; set of this block is a subset of the &#x27;out&#x27; sets of each
+predecessor.
+
+```
+memrefs = filter((liveIn(block) U
+  allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
+```
+
+The list of conditions for the second variadic operands list of
+`bufferization.dealloc` is computed by querying the stored ownership value for
+each of the MemRefs collected as described above. The ownership state is updated
+by the interface handlers while processing the basic block.
+
+**Collecting the list of MemRefs to retain**
+
+Given a basic block B, the list of MemRefs that have to be retained can be
+different for each successor block S.  For the two basic blocks B and S and the
+values passed via block arguments to the destination block S, we compute the
+list of MemRefs that have to be retained in B by taking the MemRefs in the
+successor operand list of the terminator and the MemRefs in the &#x27;out&#x27; set of the
+liveness analysis for B intersected with the &#x27;in&#x27; set of the destination block
+S.
+
+This list of retained values makes sure that we cannot run into use-after-free
+situations even if no aliasing information is present at compile-time.
+
+```
+toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
+  liveIn(toBlock)), isMemRef)
+```
+
+### Supported interfaces
+
+The pass uses liveness analysis and a few interfaces:
+*   `FunctionOpInterface`
+*   `CallOpInterface`
+*   `MemoryEffectOpInterface`
+*   `RegionBranchOpInterface`
+*   `RegionBranchTerminatorOpInterface`
+
+Due to insufficient information provided by the interface, it also special-cases
+on the `cf.cond_br` operation and makes some assumptions about operations
+implementing the `RegionBranchOpInterface` at the moment, but improving the
+interfaces would allow us to remove those dependencies in the future.
+
+### Limitations
+
+The Buffer Deallocation pass has some requirements and limitations on the input
+IR. These are checked in the beginning of the pass and errors are emitted
+accordingly:
+*   The set of interfaces the pass operates on must be implemented (correctly).
+    E.g., if there is an operation present with a nested region, but does not
+    implement the `RegionBranchOpInterface`, an error is emitted because the
+    pass cannot know the semantics of the nested region (and does not make any
+    default assumptions on it).
+*   No explicit control-flow loops are present. Currently, only loops using
+    structural-control-flow are supported.  However, this limitation could be
+    lifted in the future.
+*   Deallocation operations should not be present already. The pass should
+    handle them correctly already (at least in most cases), but it&#x27;s not
+    supported yet due to insufficient testing.
+*   Terminators must implement either `RegionBranchTerminatorOpInterface` or
+    `BranchOpInterface`, but not both. Terminators with more than one successor
+    are not supported (except `cf.cond_br`). This is not a fundamental
+    limitation, but there is no use-case justifying the more complex
+    implementation at the moment.
+
+### Example
+
+The following example contains a few interesting cases:
+*   Basic block arguments are modified to also pass along the ownership
+    indicator, but not for entry bocks of non-private functions (assuming the
+    `private-function-dynamic-ownership` pass option is disabled) where the
+    function boundary ABI is applied instead. &quot;Private&quot; in this context refers
+    to functions that cannot be called externally.
+*   The result of `arith.select` initially has &#x27;Unknown&#x27; assigned as ownership,
+    but once the `bufferization.dealloc` operation is inserted it is put in the
+    &#x27;retained&#x27; list (since it has uses in a later basic block) and thus the
+    &#x27;Unknown&#x27; ownership can be replaced with a &#x27;Unique&#x27; ownership using the
+    corresponding result of the dealloc operation.
+*   The `cf.cond_br` operation has more than one successor and thus has to
+    insert two `bufferization.dealloc` operations (one for each successor).
+    While they have the same list of MemRefs to deallocate (because they perform
+    the deallocations for the same block), it must be taken into account that
+    some MemRefs remain *live* for one branch but not the other (thus set
+    intersection is performed on the *live-out* of the current block and the
+    *live-in* of the target block). Also, `cf.cond_br` supports separate
+    forwarding operands for each successor. To make sure that no MemRef is
+    deallocated twice (because there are two `bufferization.dealloc` operations
+    with the same MemRefs to deallocate), the condition operands are adjusted to
+    take the branch condition into account. While a generic lowering for such
+    terminator operations could be implemented, a specialized implementation can
+    take all the semantics of this particular operation into account and thus
+    generate a more efficient lowering.
+
+```mlir
+func.func @example(%memref: memref&lt;?xi8&gt;, %select_cond: i1, %br_cond: i1) {
+  %alloc = memref.alloc() : memref&lt;?xi8&gt;
+  %alloca = memref.alloca() : memref&lt;?xi8&gt;
+  %select = arith.select %select_cond, %alloc, %alloca : memref&lt;?xi8&gt;
+  cf.cond_br %br_cond, ^bb1(%alloc : memref&lt;?xi8&gt;), ^bb1(%memref : memref&lt;?xi8&gt;)
+^bb1(%bbarg: memref&lt;?xi8&gt;):
+  test.copy(%bbarg, %select) : (memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+  return
+}
+```
+
+After running `--buffer-deallocation`, it looks as follows:
+
+```mlir
+// Since this is not a private function, the signature will not be modified even
+// when private-function-dynamic-ownership is enabled. Instead the function
+// boundary ABI has to be applied which means that ownership of `%memref` will
+// never be acquired.
+func.func @example(%memref: memref&lt;?xi8&gt;, %select_cond: i1, %br_cond: i1) {
+  %false = arith.constant false
+  %true = arith.constant true
+
+  // The ownership of a MemRef defined by the `memref.alloc` operation is always
+  // assigned to be &#x27;true&#x27;.
+  %alloc = memref.alloc() : memref&lt;?xi8&gt;
+
+  // The ownership of a MemRef defined by the `memref.alloca` operation is
+  // always assigned to be &#x27;false&#x27;.
+  %alloca = memref.alloca() : memref&lt;?xi8&gt;
+
+  // The ownership of %select will be the join of the ownership of %alloc and
+  // the ownership of %alloca, i.e., of %true and %false. Because the pass does
+  // not know about the semantics of the `arith.select` operation (unless a
+  // custom handler is implemented), the ownership join will be &#x27;Unknown&#x27;. If
+  // the materialized ownership indicator of %select is needed, either a clone
+  // has to be created for which %true is assigned as ownership or the result
+  // of a `bufferization.dealloc` where %select is in the retain list has to be
+  // used.
+  %select = arith.select %select_cond, %alloc, %alloca : memref&lt;?xi8&gt;
+
+  // We use `memref.extract_strided_metadata` to get the base memref since it is
+  // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
+  // already enforced for `bufferization.dealloc`
+  %base_buffer_memref, ... = memref.extract_strided_metadata %memref
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+  %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+  %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
+    : memref&lt;?xi8&gt; -&gt; memref&lt;i8&gt;, index, index, index
+
+  // The deallocation conditions need to be adjusted to incorporate the branch
+  // condition. In this example, this requires only a single negation, but might
+  // also require multiple arith.andi operations.
+  %not_br_cond = arith.xori %true, %br_cond : i1
+
+  // There are two dealloc operations inserted in this basic block, one per
+  // successor. Both have the same list of MemRefs to deallocate and the
+  // conditions only differ by the branch condition conjunct.
+  // Note, however, that the retained list differs. Here, both contain the
+  // %select value because it is used in both successors (since it&#x27;s the same
+  // block), but the value passed via block argument differs (%memref vs.
+  // %alloc).
+  %10:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref&lt;i8&gt;, memref&lt;i8&gt;, memref&lt;i8&gt;)
+        if (%false, %br_cond, %false)
+    retain (%alloc, %select : memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+
+  %11:2 = bufferization.dealloc
+           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
+             : memref&lt;i8&gt;, memref&lt;i8&gt;, memref&lt;i8&gt;)
+        if (%false, %not_br_cond, %false)
+    retain (%memref, %select : memref&lt;?xi8&gt;, memref&lt;?xi8&gt;)
+  
+  // Because %select is used in ^bb1 without passing it via block argument, we
+  // need to update it&#x27;s ownership value here by merging the ownership values
+  // returned by the dealloc operations
+  %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
+
+  // The terminator is modifi...
<truncated>
</pre>
</details>


https://github.com/llvm/llvm-project/pull/66349


More information about the llvm-commits mailing list