<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/112435>112435</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[mlir] Op needs producer extract_slice op to bufferize in place
</td>
</tr>
<tr>
<th>Labels</th>
<td>
mlir
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
matthias-springer
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Max191
</td>
</tr>
</table>
<pre>
The following two func ops have different results when running OneShotBufferize:
```mlir
// RUN: mlir-opt test.mlir --pass-pipeline="builtin.module(one-shot-bufferize{test-analysis-only=true print-conflicts=true}, canonicalize, cse, canonicalize)"
#map = affine_map<(d0) -> (d0 mod 256)>
module {
func.func @slice_bufferize_inplace(%2: tensor<2xf32>) -> tensor<2xf32> {
%cst = arith.constant 0.000000e+00 : f32
%3 = scf.forall (%arg0) in (1) shared_outs(%arg2 = %2) -> (tensor<2xf32>) {
%extracted_slice = tensor.extract_slice %arg2[%arg0] [2] [1] : tensor<2xf32> to tensor<2xf32>
%fill = linalg.fill ins(%cst : f32) outs(%extracted_slice : tensor<2xf32>) -> tensor<2xf32>
scf.forall.in_parallel {
tensor.parallel_insert_slice %fill into %arg2[%arg0] [2] [1] : tensor<2xf32> into tensor<2xf32>
}
} {mapping = [#gpu.thread<linear_dim_0>]}
return %3 : tensor<2xf32>
}
func.func @no_slice_bufferize_outplace(%2: tensor<2xf32>) -> tensor<2xf32> {
%cst = arith.constant 0.000000e+00 : f32
%3 = scf.forall (%arg0) in (1) shared_outs(%arg2 = %2) -> (tensor<2xf32>) {
%fill = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<2xf32>) -> tensor<2xf32>
scf.forall.in_parallel {
tensor.parallel_insert_slice %fill into %arg2[%arg0] [2] [1] : tensor<2xf32> into tensor<2xf32>
}
} {mapping = [#gpu.thread<linear_dim_0>]}
return %3 : tensor<2xf32>
}
}
```
The result of bufferization analysis for these func ops is the following:
run `mlir-opt test.mlir --pass-pipeline="builtin.module(one-shot-bufferize{test-analysis-only=true print-conflicts=true}, canonicalize, cse, canonicalize)"`
```mlir
module {
func.func @slice_bufferize_inplace(%arg0: tensor<2xf32>) -> tensor<2xf32> attributes {"W_1[NOT-WRITABLE: bbArg 0]"} {
%cst = arith.constant 0.000000e+00 : f32
%0 = scf.forall (%arg1) in (1) shared_outs(%arg2 = %arg0) -> (tensor<2xf32>) {
%extracted_slice = tensor.extract_slice %arg2[%arg1] [2] [1] {__inplace_operands_attr__ = ["true", "none"]} : tensor<2xf32> to tensor<2xf32>
%1 = linalg.fill {__inplace_operands_attr__ = ["none", "true"]} ins(%cst : f32) outs(%extracted_slice : tensor<2xf32>) -> tensor<2xf32>
scf.forall.in_parallel {
tensor.parallel_insert_slice %1 into %arg2[%arg1] [2] [1] {__inplace_operands_attr__ = ["true", "true", "none"]} : tensor<2xf32> into tensor<2xf32>
}
} {__inplace_operands_attr__ = ["false"], mapping = [#gpu.thread<linear_dim_0>]}
return {__inplace_operands_attr__ = ["true"]} %0 : tensor<2xf32>
}
func.func @no_slice_bufferize_outplace(%arg0: tensor<2xf32>) -> tensor<2xf32> attributes {"W_0[NOT-WRITABLE: bbArg 0]"} {
%cst = arith.constant 0.000000e+00 : f32
%0 = scf.forall (%arg1) in (1) shared_outs(%arg2 = %arg0) -> (tensor<2xf32>) {
%1 = linalg.fill {"C_0[CONFL-WRITE: 1]", __inplace_operands_attr__ = ["none", "false"]} ins(%cst : f32) outs(%arg2 : tensor<2xf32>) -> tensor<2xf32>
scf.forall.in_parallel {
tensor.parallel_insert_slice %1 into %arg2[%arg1] [2] [1] {"C_0[READ: 1]", __inplace_operands_attr__ = ["true", "true", "none"]} : tensor<2xf32> into tensor<2xf32>
}
} {"C_0[DEF: bbArg 1]", __inplace_operands_attr__ = ["false"], mapping = [#gpu.thread<linear_dim_0>]}
return {__inplace_operands_attr__ = ["true"]} %0 : tensor<2xf32>
}
}
```
The key difference is that the `linalg.fill` op bufferizes in place when there is an extract_slice producer, but it bufferizes out of place when the block argument is the init arg of the fill. The extract_slice is a full slice, so both IRs are equivalent, and they should bufferize the same way. This seems like a bug in the bufferization analysis somewhere.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWEFv27gS_jX0ZSBBoqw4OvgQWwlQoK8B8vrQo0BJI4uvFKklqaTZX78gJdlO4-7GabqLomsEkUlxZr75OJwZkxnDdxJxTdINobRj1racmcD0mssdakIpSfMFG2yr9Po_7EucxYtS1Y_rjy1Co4RQD1zuwD4oaAZZgeoNtOweoeZNgxqlBY1mENbAQ4sS9CClE7iV-N9W2c3gVvHfkSRXJMpJdEUuovGvE1xPU_SG0Bu4-98HklwBuBeB6i1YNDZ0IwiCnhkT9LxHwSWSJCeUlgMXlsuwU_UgkNBLJTEwrbJBube62jglAZNMPBpuAiXFI0lyqwcER4ENKiUbwStrpmmyygndQsWkkrxiwmlxY4PPpzNH3-RC0rEeSJIDaxousehYT5ItoZd1RGgGAUmuwY-gUzXQ9MKJJ9ej-OgCkNVmHIMnO_SMk2VkBK-w2HtVcNkLVjmXCU2pI82iNEqTZEu_NAl1emebz94cWwEgNK2MHYFrbtuwUtJYJi1EYeQ_SOgmisBZcQqOJRMvZ6ombJRmQsCIiOmd95lLNxG7r6ZlGutCDdbs11Av7T04IuikJ08ge9P4xWpWWawLT45XNYqG06v5xWjLx_-ILM2BpBs6PWP_PEUhWHWC16cwGu68TnIQXDKxC_2Yy8nJkdmRN5rBwfvn6M_awgnEgfiQy6Jn7huKr8mCmZd5QcGlQX3EzwTaqu8hy8v_OV2r_Ch6VrkD2rG-d-nCR4Kzmuz6IbStRlaTZOsOO9NFzbsictrS_IkSjXbQco7EUxSOS_dCJ0-XVMXXB0wN9pc9Ya-P6snia0LZf_4ynv8N5H0gP43ofVE9nnQVfKzNoBqYg5tZriTMBREapcG2aPBQ3rlxM4fifyjd_r8eJEzl-yeq0ntqnrcf31N8fWidmR2YtZqXg0XjbVL6qYhJuvlw-zH4dPfu49Xm_bVTWZZXegcucB3-McreKKtE38wq8TlZZc5Cf1fpjk8e4tWmmPekUD1qJmtTOI6L4nAaqQ8aSl10EEqlkji2vp7Y11b_-FmSfCGa2f6IZsY2ovnZuof4dMZ9u806f-tel8Jfgqxhwszm6RbeJuufR8rk-HiKv10qjl08r915o5wW_Uo57WQmIJRuPQ3b2w837z0RnoV48p9u4fxkcRyBL8oWP7wr-zF5Yk_e3fVV_grW_pkcskedX98cAv5M6D9jknlpQ_oZH_dXRxWOvSazvuEkF9HRASIXEah-37aicUfYwx8vmmyL2sszCU_bhl6reqhQO-7KwQK3x1rU4Nvhp5qgFKr6DEzvhg6lnVtgLrl1k07At8RciBCcG08tOhTQDEKAHzvDRkGpbAvv7gwwjYC_DfyeCZTWvWWydgofwbRqEPUBnzdjWIfwwB6dKW7AIHYGBP-MwKAcdo4Ij_l0R29Uhw-OnHBRr5M6SzK2wHW8ollEl1mWLNp1dFFGGEeMlssyS1ZNtLrAZnWZxasqpVGyXPA1jegyjuKURsskTsL4cllnLLvMLqNlw9JLsoywY1yEQtx3odK7BTdmwHUc02WSLgQrUZj50tF12uM9o1679UE57AxZRoIbaw4aLLfCX1R6gTSH2x4kYm32O_oV7ap3TdqBuzlAFoMW69ba3rifLv5yccdtO5RhpTpCb5zF6RH0Wv0fK0vojXfAEHoz-XC_pn8EAAD__ylBP5Y">