<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/112435>112435</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [mlir] Op needs producer extract_slice op to bufferize in place

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

            matthias-springer

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          Max191

      </td>

    </tr>

</table>

<pre>

    The following two func ops have different results when running OneShotBufferize:

```mlir

// RUN:  mlir-opt test.mlir --pass-pipeline="builtin.module(one-shot-bufferize{test-analysis-only=true print-conflicts=true}, canonicalize, cse, canonicalize)"

#map = affine_map<(d0) -> (d0 mod 256)>

module {

  func.func @slice_bufferize_inplace(%2: tensor<2xf32>) -> tensor<2xf32> {

    %cst = arith.constant 0.000000e+00 : f32

    %3 = scf.forall (%arg0) in (1) shared_outs(%arg2 = %2) -> (tensor<2xf32>) {

      %extracted_slice = tensor.extract_slice %arg2[%arg0] [2] [1] : tensor<2xf32> to tensor<2xf32>

      %fill = linalg.fill ins(%cst : f32) outs(%extracted_slice : tensor<2xf32>) -> tensor<2xf32>

 scf.forall.in_parallel {

        tensor.parallel_insert_slice %fill into %arg2[%arg0] [2] [1] : tensor<2xf32> into tensor<2xf32>

      }

    } {mapping = [#gpu.thread<linear_dim_0>]}

    return %3 : tensor<2xf32>

 }

  func.func @no_slice_bufferize_outplace(%2: tensor<2xf32>) -> tensor<2xf32> {

    %cst = arith.constant 0.000000e+00 : f32

    %3 = scf.forall (%arg0) in (1) shared_outs(%arg2 = %2) -> (tensor<2xf32>) {

 %fill = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<2xf32>) -> tensor<2xf32>

      scf.forall.in_parallel {

 tensor.parallel_insert_slice %fill into %arg2[%arg0] [2] [1] : tensor<2xf32> into tensor<2xf32>

      }

    } {mapping = [#gpu.thread<linear_dim_0>]}

    return %3 : tensor<2xf32>

 }

}

```

The result of bufferization analysis for these func ops is the following:

run `mlir-opt test.mlir --pass-pipeline="builtin.module(one-shot-bufferize{test-analysis-only=true print-conflicts=true}, canonicalize, cse, canonicalize)"`

```mlir

module {

  func.func @slice_bufferize_inplace(%arg0: tensor<2xf32>) -> tensor<2xf32> attributes {"W_1[NOT-WRITABLE: bbArg 0]"} {

    %cst = arith.constant 0.000000e+00 : f32

    %0 = scf.forall (%arg1) in (1) shared_outs(%arg2 = %arg0) -> (tensor<2xf32>) {

      %extracted_slice = tensor.extract_slice %arg2[%arg1] [2] [1] {__inplace_operands_attr__ = ["true", "none"]} : tensor<2xf32> to tensor<2xf32>

      %1 = linalg.fill {__inplace_operands_attr__ = ["none", "true"]} ins(%cst : f32) outs(%extracted_slice : tensor<2xf32>) -> tensor<2xf32>

 scf.forall.in_parallel {

        tensor.parallel_insert_slice %1 into %arg2[%arg1] [2] [1] {__inplace_operands_attr__ = ["true", "true", "none"]} : tensor<2xf32> into tensor<2xf32>

      }

    } {__inplace_operands_attr__ = ["false"], mapping = [#gpu.thread<linear_dim_0>]}

    return {__inplace_operands_attr__ = ["true"]} %0 : tensor<2xf32>

  }

  func.func @no_slice_bufferize_outplace(%arg0: tensor<2xf32>) -> tensor<2xf32> attributes {"W_0[NOT-WRITABLE: bbArg 0]"} {

    %cst = arith.constant 0.000000e+00 : f32

    %0 = scf.forall (%arg1) in (1) shared_outs(%arg2 = %arg0) -> (tensor<2xf32>) {

      %1 = linalg.fill {"C_0[CONFL-WRITE: 1]", __inplace_operands_attr__ = ["none", "false"]} ins(%cst : f32) outs(%arg2 : tensor<2xf32>) -> tensor<2xf32>

      scf.forall.in_parallel {

        tensor.parallel_insert_slice %1 into %arg2[%arg1] [2] [1] {"C_0[READ: 1]", __inplace_operands_attr__ = ["true", "true", "none"]} : tensor<2xf32> into tensor<2xf32>

      }

    } {"C_0[DEF: bbArg 1]", __inplace_operands_attr__ = ["false"], mapping = [#gpu.thread<linear_dim_0>]}

    return {__inplace_operands_attr__ = ["true"]} %0 : tensor<2xf32>

  }

}

```

The key difference is that the `linalg.fill` op bufferizes in place when there is an extract_slice producer, but it bufferizes out of place when the block argument is the init arg of the fill. The extract_slice is a full slice, so both IRs are equivalent, and they should bufferize the same way. This seems like a bug in the bufferization analysis somewhere.

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWEFv27gS_jX0ZSBBoqw4OvgQWwlQoK8B8vrQo0BJI4uvFKklqaTZX78gJdlO4-7GabqLomsEkUlxZr75OJwZkxnDdxJxTdINobRj1racmcD0mssdakIpSfMFG2yr9Po_7EucxYtS1Y_rjy1Co4RQD1zuwD4oaAZZgeoNtOweoeZNgxqlBY1mENbAQ4sS9CClE7iV-N9W2c3gVvHfkSRXJMpJdEUuovGvE1xPU_SG0Bu4-98HklwBuBeB6i1YNDZ0IwiCnhkT9LxHwSWSJCeUlgMXlsuwU_UgkNBLJTEwrbJBube62jglAZNMPBpuAiXFI0lyqwcER4ENKiUbwStrpmmyygndQsWkkrxiwmlxY4PPpzNH3-RC0rEeSJIDaxousehYT5ItoZd1RGgGAUmuwY-gUzXQ9MKJJ9ej-OgCkNVmHIMnO_SMk2VkBK-w2HtVcNkLVjmXCU2pI82iNEqTZEu_NAl1emebz94cWwEgNK2MHYFrbtuwUtJYJi1EYeQ_SOgmisBZcQqOJRMvZ6ombJRmQsCIiOmd95lLNxG7r6ZlGutCDdbs11Av7T04IuikJ08ge9P4xWpWWawLT45XNYqG06v5xWjLx_-ILM2BpBs6PWP_PEUhWHWC16cwGu68TnIQXDKxC_2Yy8nJkdmRN5rBwfvn6M_awgnEgfiQy6Jn7huKr8mCmZd5QcGlQX3EzwTaqu8hy8v_OV2r_Ch6VrkD2rG-d-nCR4Kzmuz6IbStRlaTZOsOO9NFzbsictrS_IkSjXbQco7EUxSOS_dCJ0-XVMXXB0wN9pc9Ya-P6snia0LZf_4ynv8N5H0gP43ofVE9nnQVfKzNoBqYg5tZriTMBREapcG2aPBQ3rlxM4fifyjd_r8eJEzl-yeq0ntqnrcf31N8fWidmR2YtZqXg0XjbVL6qYhJuvlw-zH4dPfu49Xm_bVTWZZXegcucB3-McreKKtE38wq8TlZZc5Cf1fpjk8e4tWmmPekUD1qJmtTOI6L4nAaqQ8aSl10EEqlkji2vp7Y11b_-FmSfCGa2f6IZsY2ovnZuof4dMZ9u806f-tel8Jfgqxhwszm6RbeJuufR8rk-HiKv10qjl08r915o5wW_Uo57WQmIJRuPQ3b2w837z0RnoV48p9u4fxkcRyBL8oWP7wr-zF5Yk_e3fVV_grW_pkcskedX98cAv5M6D9jknlpQ_oZH_dXRxWOvSazvuEkF9HRASIXEah-37aicUfYwx8vmmyL2sszCU_bhl6reqhQO-7KwQK3x1rU4Nvhp5qgFKr6DEzvhg6lnVtgLrl1k07At8RciBCcG08tOhTQDEKAHzvDRkGpbAvv7gwwjYC_DfyeCZTWvWWydgofwbRqEPUBnzdjWIfwwB6dKW7AIHYGBP-MwKAcdo4Ij_l0R29Uhw-OnHBRr5M6SzK2wHW8ollEl1mWLNp1dFFGGEeMlssyS1ZNtLrAZnWZxasqpVGyXPA1jegyjuKURsskTsL4cllnLLvMLqNlw9JLsoywY1yEQtx3odK7BTdmwHUc02WSLgQrUZj50tF12uM9o1679UE57AxZRoIbaw4aLLfCX1R6gTSH2x4kYm32O_oV7ap3TdqBuzlAFoMW69ba3rifLv5yccdtO5RhpTpCb5zF6RH0Wv0fK0vojXfAEHoz-XC_pn8EAAD__ylBP5Y">