<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/101709>101709</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[mlir] Bufferization issue after tensor.insert_slice
</td>
</tr>
<tr>
<th>Labels</th>
<td>
mlir
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
n-io
</td>
</tr>
</table>
<pre>
The `--one-shot-bufferize` pass lowers `tensor.insert_slice` to `memref.subview`, but also allocates an extra buffer and copies it into the subview. Instead, it could use the subview and save both an alloc and a copy.
The following has a `linalg.add` followed by a `tensor.insert_slice` in a partially bufferized custom dialect:
```
builtin.module {
%cst = arith.constant dense<1.234500e-01> : tensor<8xf32>
"mydialect.func"() <{function_type = (memref<16xf32>, index) -> memref<16xf32>, sym_name = "test"}> ({
^bb0(%res: memref<16xf32>, %offset: index):
%val = "mydialect.test_op"() : () -> (memref<8xf32>)
%0 = bufferization.to_tensor %val restrict : memref<8xf32>
%res_t = bufferization.to_tensor %res restrict writable : memref<16xf32>
%1 = linalg.add ins(%0, %cst : tensor<8xf32>, tensor<8xf32>) outs(%0 : tensor<8xf32>) -> tensor<8xf32>
%2 = tensor.insert_slice %1 into %res_t[%offset] [8] [1] : tensor<8xf32> into tensor<16xf32>
%3 = bufferization.to_memref %2 : memref<16xf32>
"mydialect.return"(%3) : (memref<16xf32>) -> ()
}) : () -> ()
}
```
Running `mlir-opt -allow-unregistered-dialect --one-shot-bufferize="allow-unknown-ops"` yields:
```
module {
memref.global "private" constant @__constant_8xf32 : memref<8xf32> = dense<1.234500e-01> {alignment = 64 : i64}
%0 = memref.get_global @__constant_8xf32 : memref<8xf32>
"mydialect.func"() <{function_type = (memref<16xf32>, index) -> memref<16xf32>, sym_name = "test"}> ({
^bb0(%arg0: memref<16xf32>, %arg1: index):
%1 = "mydialect.test_op"() : () -> memref<8xf32>
%alloc = memref.alloc() {alignment = 64 : i64} : memref<8xf32>
linalg.add ins(%1, %0 : memref<8xf32>, memref<8xf32>) outs(%alloc : memref<8xf32>)
%subview = memref.subview %arg0[%arg1] [8] [1] : memref<16xf32> to memref<8xf32, strided<[1], offset: ?>>
memref.copy %alloc, %subview : memref<8xf32> to memref<8xf32, strided<[1], offset: ?>>
"mydialect.return"(%arg0) : (memref<16xf32>) -> ()
}) : () -> ()
}
```
It'd be great if it could move up and use the `%subview` instead of the new `%alloc`. I'm not sure if this can be achieved with the existing passes?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV0GPozgT_TXOpQQyJpBwyKG7M5Hm-um7RwaK4F1jR7ZJOvvrVzaQ0DMks1qttCu1QsfYr55flZ8r3FpxUog7kr2TbL_ivWu12alI6FWp69vu_y0CyWkUaYWRbbWLyr5p0Ig_kOQUztxakPqKxvppDpXVJhbKonFHK0UVZjntX3bYGWxi25cXgVeSU8I-oOwdcGk1cCl1xR1a4Arw0xkOQyTgqoZKnwVaEA6EchpcizDixPBdWYe89mjCQaV7WUNvcT4pYFh-QSi1a32EEC4Mcw9-iwndE_o2fPpNN1pKfRXqBC23wP0GpFBcnmJe135TwwSsobwNr59sXijgcObGCS7lDe7y1VD11ukOasElVo6kb3MOXp_hL3wteyGdUHGn614ikM37MA5AWFZZByTdAzfCtXGllXVcOahRWSTpRxKzdJ1RihFNSPoNSPoGA1mSfmw_m5SR9NsIRxjrbiOjuOlVRRgjbEtYAST9IJt3P-aEVkd3O2OISth2SK0PlU9wPhuqxk-_MvJRl-fYW3dUvJuQmEPrfMjNPjBl2_lOs29lSQObzKD121gGJSzTTWPRa3pncdcXIIh24XIK-tixD3_U5_mm32D8NxoZ3WPepWPFF2QacKdEc69W7PRxkHwKbdA6IyoHX7bxQzYGPIP26H6FadA-MK9GOF76OlnWaI6eBORHbYNQdtCYjloO5bVUM-xjabAA3bsJ49nKUc9ndThwY4HbwsEaiAcvmATyDnbPe7YHkr1vx2cSnks8RjuZhpcVSpe1H3SdaL7Q-UuFGXS9UVOBZemsyJaK-VF2syrzp2O5Nu-T_JQlHxk-_9cr5a3Nu7IUJtJnB5G3xGvUK4MnYR0arKORNCzaf7onjE2Lflf6qiJ9tn5nOYWbQFnb1572s5eNV8RJ6tKfTsbORly4Q8IY3G2NrOnxOH07hkwuH6KQtqcmuHnnUpxUh2o4W_k6oIh8fddudponZuiOE7u_SuOB9Z-11rmzcnOiL62Vm1PyyliTv2Grrw1wuKxnaQgDE9TLRP7KXpd8Lxk3Sp8sZh9Lg3PXmwgvrv56WUwdymx396ExGcHYgupPbO3nVPmW64fQviKcETXWvtqG1X7wcVOS9OAJzuUZGfkO6Z6IUZ0H8aWj9w_Ff2WdQZt_wz2_O8I2NZQIJ4PcgWgejWenLwj9OXSWUxMaWt3s0ff6SvMdK-gmvFdexTBn0DenMXwnbNOB0g5sb9CHcK2wUHHl4_KqFXjBGq7CtQEDP4V13tJ9R-57o8Oq3qV1kRZ8hbtkw1ie5EmRrNrduq4bzGhRF-uyyvI139I8XdM140VV1hu2EjtG2ZpuKUtyVmRJnG2KdFtmTbMpKkrLmqwpdlzIWMpLF2tzWglre9wlNNnQYiV5idKGnxSM-fvFpyzbr8zOz4_K_mTJmkphnX0gOOFk-BkSFmR7eJ_ftxACAG8cmqV-YNUbuWudO4crhx0IO5yEa_syrnRH2MGHGR_R2ejffL_NDgHUEnYYiV927M8AAAD__0whzGA">