<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/58293>58293</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            Bufferization Over Aggressive Copy Removal

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          squablyScientist

      </td>

    </tr>

</table>

<pre>

    Hello. While attempting to bufferize some mhlo code, I noticed a weird semantic change regarding how `iter_args` are lowered. It appears that if an `iter_arg` is not explicitly used after an `scf.for` loop, copy removal will be performed. However, this can lead to semantically different side effects than expected. For example, consider the following two IR snippets:

### IR A

```mlir

func.func @loop_example(%arg0: tensor<16xi32>) -> (tensor<16xi32>) attributes {tf.entry_function = {control_outputs = "", inputs = "args_0", outputs = "Identity"}}{

    %c0 = arith.constant 0 : index 

    %c1 = arith.constant 1 : index 

    %c16 = arith.constant 16 : index 

    %c1_i32 = arith.constant 1 : i32

    %0 = scf.for %arg1 = %c0 to %c16 step %c1 iter_args(%arg2 = %arg0) -> (tensor<16xi32>) {

        %1 = tensor.extract %arg2[%arg1] : tensor<16xi32>

        %2 = arith.addi %1, %c1_i32 : i32

        %3 = tensor.insert %2 into %arg2[%arg1] : tensor<16xi32>

        scf.yield %3 : tensor<16xi32>

        }

    return %arg0 : tensor<16xi32>

}

```

### IR B

```mlir

func.func @loop_example(%arg0: tensor<16xi32>) -> (tensor<16xi32>) attributes {tf.entry_function = {control_outputs = "", inputs = "args_0", outputs = "Identity"}}{

    %c0 = arith.constant 0 : index

    %c1 = arith.constant 1 : index

    %c16 = arith.constant 16 : index

    %c1_i32 = arith.constant 1 : i32

    %0 = scf.for %arg1 = %c0 to %c16 step %c1 iter_args(%arg2 = %arg0) -> (tensor<16xi32>) {

        %1 = tensor.extract %arg2[%arg1] : tensor<16xi32>

        %2 = arith.addi %1, %c1_i32 : i32

        %3 = tensor.insert %2 into %arg2[%arg1] : tensor<16xi32>

        scf.yield %3 : tensor<16xi32>

    }

    return %0 : tensor<16xi32>

}

```

These two IRs are identical save for the fact that `IR A` uses `%arg0` post-loop, and `IR B` does not. They should return two different things; `IR A` throws out the loop modifications and `IR B` returns them. However, when lowering with the `mlir-opt` command: 

```sh

mlir-opt --one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" --buffer-results-to-out-params --linalg-init-tensor-to-alloc-tensor --convert-tensor-to-linalg --finalizing-bufferize --buffer-deallocation -convert-bufferization-to-memref

```

something unexpected happens: 

### Bufferized IR A

```mlir

module {

  func.func @loop_example(%arg0: memref<16xi32, #map>, %arg1: memref<16xi32, #map>) attributes {tf.entry_function = {control_outputs = "", inputs = "args_0", outputs = "Identity"}} {

    %c1_i32 = arith.constant 1 : i32

    %c16 = arith.constant 16 : index

    %c1 = arith.constant 1 : index

    %c0 = arith.constant 0 : index

    %0 = memref.alloc() {alignment = 128 : i64} : memref<16xi32>

    memref.copy %arg0, %0 : memref<16xi32, #map> to memref<16xi32>

    scf.for %arg2 = %c0 to %c16 step %c1 {

      %1 = memref.load %0[%arg2] : memref<16xi32>

      %2 = arith.addi %1, %c1_i32 : i32

      memref.store %2, %0[%arg2] : memref<16xi32>

    }

    memref.dealloc %0 : memref<16xi32>

    memref.copy %arg0, %arg1 : memref<16xi32, #map> to memref<16xi32, #map>

    return

  }

}

```

### Bufferized IR B

```mlir

module {

  func.func @loop_example(%arg0: memref<16xi32, #map>, %arg1: memref<16xi32, #map>) attributes {tf.entry_function = {control_outputs = "", inputs = "args_0", outputs = "Identity"}} {

    %c1_i32 = arith.constant 1 : i32

    %c16 = arith.constant 16 : index

    %c1 = arith.constant 1 : index

    %c0 = arith.constant 0 : index

    scf.for %arg2 = %c0 to %c16 step %c1 {

      %0 = memref.load %arg0[%arg2] : memref<16xi32, #map>

      %1 = arith.addi %0, %c1_i32 : i32

      memref.store %1, %arg0[%arg2] : memref<16xi32, #map>

    }

    memref.copy %arg0, %arg1 : memref<16xi32, #map> to memref<16xi32, #map>

    return

  }

}

```

Seen above, `IR B` becomes bufferized in a way that removes the copying from `%arg0` into the local `%0` allocation. This causes the loop to write directly to `%arg0`, modifying the input when the original IR did not. I believe this is a bufferization bug, as it is a pass that alters the semantics of a program. Is there a way to disable this, or is this truly a bug?

---

### LLVM SHA

`54d179116e7a79eb1fdf7819aad62b4d76bc0e15e8567871cae9b675f7dec5c1`

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztWElv4zYU_jX2hZChJZLtgw_jpMEEmKLATNEeA0p6slhQokpScTy_vu-Rkrc6i2cOLQYTCLFJPr59-axclbvVR5BSzdiftZDAuLXQdFa0G2YVy_uqAi2-AjOqAdbUUrFClTCJb9kDa5UVBZSMsy0IXTIDDW9xixU1bzfANGy4LolVrbZskoXCgn7kemPwO-MamFRb0FDO2INlvOuAa8NszS0TFePt8RW6IQyJZPDcSVEIK3esNyS-QpqB3BTVrFKaqKVSHelZqG6HqjTqiUu2FVKyHFgHGskaEv0RdXgCTaS2RhEFcpLAS7J_tIhLFFYKcga0lhlRAgNcFdbp25JOuCB290rjijedBC-9JWqNZMAqhZ7eOt9uFXv4zEwr0GprJsmHSXg3Ccf_ceIfohn3stA_jRTab1V9W8zoH5vchGTu417wYhKn6LUQGTMLrUGXJLdR9iySeJL8MomXLMBPhoSXTzENtMh7C4ZN5mtbzdBsvXskYVYodHVyRwdondVKPqredj36wm3HsXtumWiPNynuj-FwdHbhoUT-wu7odH7nnrW3keEf2lKEjpRrYesZ-dRiXBhtfkAxJTyzU_LoEnn0Mnl2kT57-cIj-uoVGejIY3qv_ZCczMcmGown2zDVBjWMhW6w4FAtYzjj8YoL7ZtBPHHioImX6uln8Gw1L-zAMZ6k60G1SXrHXkidc4bHXuBlKZwQCvGxm848MlxNjnURrQFtPUfReod8m1Lk5p0AWY4y3mEHZtx-Q4PtdTu6-dX7-3v76nyxjNc_y_gdZXxdFV9bxD9r-Aet4Rfq93uK9_caDAxz2jiwIlx2IxZghj_RMB-GOnnfgRbk4OY1gg8EJobWY5xxq1PGBiMo4W05kK_prFTgsM2ModgdM7Xq0fTBENLhgD0Qo7SYTsn6WJyttdoaKkenEklhjcJLqC4VujkX6FkTeIHmBAJta2g9LCOYssWMcByHhhWoztL1QjUIjEry7pkLTe03RmoWBKqFAC2ywR5NYspgi9gvg7EfBbnq2xLzEL3BCSoFXs-AFgXWUIzs_DU8Mb20JrAqQLODjmveGDyWouVyE4hW2MBHnkgcg2GNRFjqaO8xgb-GRxV9EV_R-oO-B6klOE7OqWzPZiR028StgUZDdTG7CEi7GLK-HVEjqwn8tubIn2cDZD2qUr4OCTHoPaL4o67xzukyaLwvElf8ScM714dux373Nul_P3TYv6bOlZ3--kly1Zy6agZ6Yu_ymcs9Fzc3GDBPN21DTYFoonjheWQ3zgkXInXcMgeW7tfRfhrdHrrma1GmWfca79MhGb81JM9m3H7CDSpKxd0YCPdjJB7HyGtKfMdkGwQbq7DtE5fRM9dpcDKVBp5DA3nZz--L0YA9rg_TSbGejcxxfRiM74S3p93pZaT7szv9SN3pu6s8vFTlLuRvltnlND7qHaclH15d8tEhrb5ZnUv1_z-t5S-AsI_n6sm9tDoCijkg1sNayQ8lLlp648d3HvS6N2vgkKR71UbIptKqOUO_7ueAx6aEoP2hOzkAKkK_7gWcQ897IIsXtxhNQBCsES7JncuvY-6ks4O7TjpddAXqwSwtlRYbwnXUnkpReqj9gMZJgcDXv_fDh7MTJIerjQPreGj9ecfN8IaSS_xp57Uc3xIiAK-IRqsNglEU4I4xnwZ3EYw3PJdeoGsWmtg68Vb3aBl3MpP749gEQXC57X769Mev7MvHAxpMb8povoyiDOZ8voQ8qspqvoiWnJdZnN-U8ywvQohSWKTZfDGPCg7LPJun1byEAovjLCmmsIqyLE3DJFqk03KVlMtkyafYziSs1iee-g1xMPuw2SAqNwI9ektp_tm_dJ32Wq5qazv3mjO-x2eDxdnnWAwNLqR8Gj8C9N1fGGNcCmN6QCfdp4t4mUzrVV7wRZZW0aJa8HlSVDwOlzdRghqFGaQhTCXHeJqVq9a4hS1zLKjxpndTsYrDOI7CKIrSmyxOZ0lSoXuWIToHijhNcf5gGIWckR4zpTdTvXIqYUAMDSdhrDkcYh4g-AJw4pA_722t9Mr83WOAd18KQW3f2KnTYeVs-AcwGclj">