[Openmp-commits] [openmp] [mlir][doc] Improve Destination-passing-style documentation (PR #70283)

Wed Oct 25 20:44:59 PDT 2023

https://github.com/joker-eph updated https://github.com/llvm/llvm-project/pull/70283

>From 0bdf7a0bc1c1e3b5fc3280e9ba5f5cacfeeb5f7f Mon Sep 17 00:00:00 2001
From: Mehdi Amini <joker.eph at gmail.com>
Date: Wed, 25 Oct 2023 19:17:32 -0700
Subject: [PATCH 1/3] Update Bufferization.md

---
 mlir/docs/Bufferization.md | 39 ++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index d9d0751cae8c9dd..ea3593549ca1563 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -101,10 +101,28 @@ bufferization strategy would be unacceptable for high-performance codegen. When
 choosing an already existing buffer, we must be careful not to accidentally
 overwrite data that is still needed later in the program.
 
-To simplify this problem, One-Shot Bufferize was designed for ops that are in
-*destination-passing style*. For every tensor result, such ops have a tensor
-operand, whose buffer could be utilized for storing the result of the op in the
-absence of other conflicts. We call such tensor operands the *destination*.
+To simplify this problem, One-Shot Bufferize was designed to take advantage of
+*destination-passing style*. This form exists in itself independently of
+bufferization and is tied to SSA semantics: many ops are “updating” part of
+their input SSA variable. For example the LLVM instruction
+[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction)
+is inserting an element inside a vector. Since SSA values are immutable, the
+operation returns a copy of the input vector with the element inserted.
+Another example in MLIR is `linalg.generic`, which always has an extra `outs`
+operand which provides the initial values to update (for example when the
+operation is doing a reduction). 
+
+This input is referred to as "destination" in the following (quotes are
+important are this operand isn't modified in place but copied) and come into
+place in the context of bufferization as a possible "anchor" for the
+bufferization algorithm. This allows the user to shape the input in a form that
+guarantees close to optimal bufferization result when carefully choosing the
+SSA value used as "destination".
+
+For every tensor result, a "destination-passing" style op has a corresponding
+tensor operand. If there aren't any other uses of this tensor, the bufferization
+can alias it with the op result and perform the operation "in-place" by reusing
+the buffer allocated for this "destination" input.
 
 As an example, consider the following op: `%0 = tensor.insert %cst into
 %t[%idx] : tensor<?xf32>`
@@ -112,15 +130,16 @@ As an example, consider the following op: `%0 = tensor.insert %cst into
 `%t` is the destination in this example. When choosing a buffer for the result
 `%0`, One-Shot Bufferize considers only two options:
 
-1.  buffer(`%0`) = buffer(`%t`).
-2.  buffer(`%0`) is a newly allocated buffer.
+1.  buffer(`%0`) = buffer(`%t`): alias the destination tensor with the
+    result and perform the operation in-place.
+3.  buffer(`%0`) is a newly allocated buffer.
 
 There may be other buffers in the same function that could potentially be used
 for buffer(`%0`), but those are not considered by One-Shot Bufferize to keep the
 bufferization simple. One-Shot Bufferize could be extended to consider such
 buffers in the future to achieve a better quality of bufferization.
 
-Tensor ops that are not in destination-passing style always bufferize to a
+Tensor ops that are not in destination-passing style always bufferized to a
 memory allocation. E.g.:
 
 ```mlir
@@ -159,9 +178,9 @@ slice of a tensor:
 ```
 
 The above example bufferizes to a `memref.subview`, followed by a
-"`linalg.generic` on memrefs" that overwrites the memory of the subview. The
-`tensor.insert_slice` bufferizes to a no-op (in the absence of RaW conflicts
-such as a subsequent read of `%s`).
+"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming
+that the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes
+to a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`).
 
 RaW conflicts are detected with an analysis of SSA use-def chains (details
 later). One-Shot Bufferize works best if there is a single SSA use-def chain,

>From a06bcdde0c75dbab260d7d31d4dcaf0b169d8811 Mon Sep 17 00:00:00 2001
From: Mehdi Amini <joker.eph at gmail.com>
Date: Wed, 25 Oct 2023 20:42:41 -0700
Subject: [PATCH 2/3] Update Bufferization.md

---
 mlir/docs/Bufferization.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index ea3593549ca1563..88a2e50e85d938d 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -113,7 +113,7 @@ operand which provides the initial values to update (for example when the
 operation is doing a reduction). 
 
 This input is referred to as "destination" in the following (quotes are
-important are this operand isn't modified in place but copied) and come into
+important as this operand isn't modified in place but copied) and comes into
 place in the context of bufferization as a possible "anchor" for the
 bufferization algorithm. This allows the user to shape the input in a form that
 guarantees close to optimal bufferization result when carefully choosing the

>From 2ff038576a0c505da1601d614d27110bc00624c9 Mon Sep 17 00:00:00 2001
From: Mehdi Amini <joker.eph at gmail.com>
Date: Wed, 25 Oct 2023 20:44:48 -0700
Subject: [PATCH 3/3] Update Bufferization.md

---
 mlir/docs/Bufferization.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md
index 64b10d49bd11cbf..8329999162fb5aa 100644
--- a/mlir/docs/Bufferization.md
+++ b/mlir/docs/Bufferization.md
@@ -184,9 +184,7 @@ to a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`).
 
 RaW conflicts are detected with an analysis of SSA use-def chains (details
 later). One-Shot Bufferize works best if there is a single SSA use-def chain,
-where the result of a tensor op is the 
-operand of the next tensor
-ops, e.g.:
+where the result of a tensor op is the operand of the next tensor ops, e.g.:
 
 ```mlir
 %0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)