[llvm] [mlir][bufferization] Don't clone on unknown ownership and verify function boundary ABI (PR #66626)

Wed Sep 27 02:24:24 PDT 2023

================
@@ -132,30 +132,79 @@ void DeallocationState::getLiveMemrefsIn(Block *block,
   memrefs.append(liveMemrefs);
 }
 
-std::pair<Value, Value>
-DeallocationState::getMemrefWithUniqueOwnership(OpBuilder &builder,
-                                                Value memref, Block *block) {
-  auto iter = ownershipMap.find({memref, block});
-  assert(iter != ownershipMap.end() &&
-         "Value must already have been registered in the ownership map");
-
-  Ownership ownership = iter->second;
-  if (ownership.isUnique())
-    return {memref, ownership.getIndicator()};
-
-  // Instead of inserting a clone operation we could also insert a dealloc
-  // operation earlier in the block and use the updated ownerships returned by
-  // the op for the retained values. Alternatively, we could insert code to
-  // check aliasing at runtime and use this information to combine two unique
-  // ownerships more intelligently to not end up with an 'Unknown' ownership in
-  // the first place.
-  auto cloneOp =
-      builder.create<bufferization::CloneOp>(memref.getLoc(), memref);
-  Value condition = buildBoolValue(builder, memref.getLoc(), true);
-  Value newMemref = cloneOp.getResult();
-  updateOwnership(newMemref, condition);
-  memrefsToDeallocatePerBlock[newMemref.getParentBlock()].push_back(newMemref);
-  return {newMemref, condition};
+Value DeallocationState::getMemrefWithUniqueOwnership(
+    const DeallocationOptions &options, OpBuilder &builder, Value memref,
+    Block *block) {
+  // NOTE: * if none of the operands have the same allocated pointer, a new
+  // memref was allocated and thus the operation should have the allocate
+  // side-effect defined on that result value and thus the correct unique
+  // ownership is pre-populated by the ownership pass (unless an interface
+  // implementation is incorrect).
+  //       * if exactly one operand has the same allocated pointer, this retunes
+  //       the ownership of exactly that operand
+  //       * if multiple operands match the allocated pointer of the result, the
+  //       ownership indicators of all of them always have to evaluate to the
+  //       same value because no dealloc operations may be present and because
+  //       of the rules they are passed to nested regions and successor blocks.
+  //       This could be verified at runtime by inserting `cf.assert`
+  //       operations, but would require O(|operands|^2) additional operations
+  //       to check and is thus not implemented yet (would need to insert a
+  //       library function to avoid code-size explosion which would make the
+  //       deallocation pass a module pass)
+  auto ipSave = builder.saveInsertionPoint();
+  SmallVector<Value> worklist;
+  worklist.push_back(memref);
+
+  while (!worklist.empty()) {
----------------
maerhart wrote:

In theory that is totally possible and also the reason why I went with this more complicated implementation. E.g., it may be the case that some op with side-effects has to be inserted, just like we create another `arith.select` in its interface implementation but with the difference that this other op is not as pure as `arith.select`. I'm not sure how frequent/relevant such cases are, but I know that in quite a lot of cases we don't care about the ownership of memref values (we only need them for the memrefs we put in the first operand list of the dealloc op, call ops, and ops that forward operands to nested regions (e.g., iter_args of `scf.for`), but not for memrefs we put in the retained list of the dealloc op (there we get the new ownership by other means). In the current implementation, the user of the pass/pipeline can basically choose whether to materialize eagerly (by always calling this materialize method in the `process` method when implementing the interface) or lazily (by default). It's a trade-off between additional pass complexity and providing more flexibility to the user (where I'm not exactly sure if the additional flexibility is actually needed in practice). Maybe just keep it like that and re-evaluate once we know more about the user's needs?

https://github.com/llvm/llvm-project/pull/66626