<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/130045>130045</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [mlir] Inconsistent output when executing MLIR program with `--test-loop-fusion="test-loop-fusion-transformation"`

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          Emilyaxe

      </td>

    </tr>

</table>

<pre>

    git version: 953838dceaff

system: `Ubuntu 18.04.6 LTS`

## Description:

I am experiencing an inconsistent result when executing the same MLIR program with and without `--test-loop-fusion="test-loop-fusion-transformation"`. 

## Steps to Reproduce:

### 1. **MLIR Program (a.mlir)**:

a.mlir: 

``` 

module {

  func.func private @printMemrefI32(tensor<*xi32>)

  func.func private @printMemrefF32(tensor<*xf32>)

  func.func @entry(%arg0: index) -> tensor<1x2xi32> {

    %11 = "tosa.const"() <{value = dense<0> : tensor<1x2x2xi32>}> : () -> tensor<1x2x2xi32>

    %12 = tosa.while_loop (%arg1 = %11) : (tensor<1x2x2xi32>) -> tensor<1x2x2xi32> {

      %51 = "tosa.const"() <{value = dense<6> : tensor<1x2x2xi32>}> : () -> tensor<1x2x2xi32>

      %52 = tosa.greater %51, %arg1 : (tensor<1x2x2xi32>, tensor<1x2x2xi32>) -> tensor<1x2x2xi1>

      %extracted = tensor.extract %52[%arg0, %arg0, %arg0] : tensor<1x2x2xi1>

      %from_elements = tensor.from_elements %extracted : tensor<i1>

      tosa.yield %from_elements : tensor<i1>

    } do {

    ^bb0(%arg1: tensor<1x2x2xi32>):

      %51 = "tosa.const"() <{value = dense<1> : tensor<1x2x2xi32>}> : () -> tensor<1x2x2xi32>

      %52 = tosa.add %arg1, %51 : (tensor<1x2x2xi32>, tensor<1x2x2xi32>) -> tensor<1x2x2xi32>

      tosa.yield %52 : tensor<1x2x2xi32>

    }

    %50 = tosa.argmax %12 {axis = 1 : i32} : (tensor<1x2x2xi32>) -> tensor<1x2xi32>

    return %50 : tensor<1x2xi32>

  }

  func.func @main() {

    %idx0 = index.constant 0

    %0 = call @entry(%idx0) : (index) -> tensor<1x2xi32>

    %cast = tensor.cast %0 : tensor<1x2xi32> to tensor<*xi32>

    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()

    return

  }

}

``` 

 ### 2. **Command to Run without  `--test-loop-fusion` :**

``` 

/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -tosa-to-scf \

| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))"  \

| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-arith  -convert-scf-to-cf   -convert-arith-to-llvm   \

-convert-linalg-to-loops   -convert-linalg-to-parallel-loops  -convert-linalg-to-loops  -one-shot-bufferize="bufferize-function-boundaries" \

--expand-strided-metadata   -convert-linalg-to-affine-loops   -finalize-memref-to-llvm   -lower-affine  -convert-scf-to-cf   \

-convert-cf-to-llvm   -finalize-memref-to-llvm  -convert-func-to-llvm   -convert-index-to-llvm  -reconcile-unrealized-casts \

| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 3. **Output  without   `--test-loop-fusion` :**:

``` 

[[0,   0]]

``` 

### 4. **Command to Run with  `--test-loop-fusion`  :**

``` 

/data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt /data/szy/workspace/mlir-inconsistent/a.mlir -tosa-to-scf \

| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-linalg))"  \

| /data/szy/MLIR/llvm-release/llvm-project/install/mlir-opt -tosa-to-arith  -convert-scf-to-cf   -convert-arith-to-llvm   \

-convert-linalg-to-loops   -convert-linalg-to-parallel-loops  -convert-linalg-to-loops  -one-shot-bufferize="bufferize-function-boundaries" \

--expand-strided-metadata   -convert-linalg-to-affine-loops   -finalize-memref-to-llvm  --test-loop-fusion="test-loop-fusion-transformation" \ -lower-affine  -convert-scf-to-cf   -convert-cf-to-llvm   -finalize-memref-to-llvm  -convert-func-to-llvm   -convert-index-to-llvm  \

-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so \

--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 5. **Output with  `--test-loop-fusion` :**

``` 

[[1,   1]]

``` 

### 6. **Analysis for this case :**

As some developers have discussed in issue https://github.com/llvm/llvm-project/issues/118631, the test-pass should also preserve the semantics of MLIR.

This MLIR program is expected to correctly output [0, 0] for `%50 = tosa.argmax %12`, given that all elements in` %12` are equal. However, instead of the expected result, it incorrectly outputs [1, 1], which corresponds to the last index of` %12`.

To debug this issue, I printed the IR after each pass and found that the input IR [input.txt](https://github.com/user-attachments/files/19105934/input.txt) is correct before applying the `--test-loop-fusion="test-loop-fusion-transformation"` pass. As shown in the first image, %173 is initialized before the three-level computation (lines 221–237). However, after running `--test-loop-fusion="test-loop-fusion-transformation"`, %173 is instead initialized inside the innermost loop, which causes it to overwrite the assigned value at line 253 in the second image [output.txt](https://github.com/user-attachments/files/19105936/output.txt).

Additionally, I try the `affine-loop-fusion` pass on the same input IR. As shown in the third image, %173 should be initialized in the second loop, leading to the correct result.

![Image](https://github.com/user-attachments/assets/c51e876d-1dba-4a4f-a879-f7b1c6ece89d)

![Image](https://github.com/user-attachments/assets/5146deeb-e21c-4444-b0e1-b817dc255dcf)

![Image](https://github.com/user-attachments/assets/80d2538e-fa51-4e99-bf10-3175f1aab002)

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWd1u4zjSfRrmhpAhUZZ_LnLhJG18AWbwLXpmrxuUWLK4S5FaknLsefpFkZItO3FmkHQvsIsGAscWyapTp0qnKIo7J3ca4J4UD6R4uuO9b4y9_9JKdeQHuCuNON7vpKd7sE4aTfINXRf5Kl-JCnhdk3RD0o07Og8tjpFF-vey176n2WqWzmcL-svvv5FFGucRlhOW0ydwlZWdj_ZIunmmvKVw6MBK0JXUO8o1lboy2knnQXtqwfXK05cGNIUDVL3HWb4B6ngL9Ndfnr_Szpqd5S19kb6hXIvwxfQeQSWJB-cTZUyX1H2M5Ikwdn018ZZrVxvb8gCPMbJIZ3SAf4rgNw-do97Qr9BZI_oKYiCnGTgpm1HCNoRtArq_DegIW_FZq6QlbB2H49LhInKYbpCx8Ic_WiN6BZQsH0i6obTudTXDD9pZueceKJmnnZXa_wqthfo5Z4StPGhnLMkfCdscZM5I_gUd_hUD29cG6hsGyDwF7e2RsBVhBbe7FAOQWsCBsDVNSP6FngxlBzYgOcVCKWFFllGSP1HMhnF8hln3SDzaXFNEsHzYc9VDmCZAOyD5Yxrs5JtL-6MHsnwaxwc7r7Gc5p6RsOAiwHhppIJvWBr0FN0ItMiyCC1Yf9vouy4nBATHxYcoWHxvCiKWCQk7C9yDjRAJe6RnHt4L_fGGl1sYsisIcPCWVx5ERBJmz4aLESLK1VBwJ1QXX4unN5m59lRb034DBS1o76bergYuMU3sXloMpB0lKPGW8RvLyPKJCjO9J4ovZZmeq-52itk6iscnCyn74YXEhRhrZ8hS8b2L6NL_ZSYClFvhnbJwFoIinUC3u5YfRnlYPvCDjJUSA0Ajy6cPaMGFdwu-t_rkenNz6gjzQoNbLvWY4qmySnGIgQRFjtXAtafpeUocr7hSV2KOayci96eafjZZceent1L8zW4Hho30rX4VLY7YXrW4wdVbSnzR9CLkyM8F3RNCh8-rzosTzv2cjf380bQt7i-w__f6tM24sc9YhLiHVv-GD7YV3HPCtu6PI2Fb3CwQtlVq3yYWFHAH48_Omn9A5QnbSkykUoRtcdOQmA4JvrTzYuw_XccrGCdN91OEbeN-gyZY44k3iatqSorHwMXjK2sfRZV03Lmkkx0oqSHuucpeKi_1LO5rCFudahlTOMBRUnO1C3ukNWGM_ghsoy9uccdIk8roPViPVODlqqaTi2FSQKb2LR3xnIYj3jBuTOemK89DHbdcKVDjnNurE6MhcY3xSdnXNVj5x4m84WeChOEeNSlNrwW3EhwSNeBK4NBxLRLnrRQgkhY8R9beBsbrWmo4Q69xBJ204X6bhJ0o8wJ2WHCDsytqqov1N02f5mNg0xXj9SBBk_kWKqMrqSDptYVgVCSoCG5SLF62gDdnln6sbrBYBf5HhY21U3V9YnutwdIEKGovTYJwJp2R2ifxWYXkT3sjxSkfruEWRKJk6UImP4FFyTJ-Ipxv1bcI5lvvpXIzZ368y_-4Q-6OOsSJ6Zw587ZWn6U6H6X6_3vfoTSfNPqviHT-pk7jA_JD2GNSivtL_PsTGPN3OsY7SF71i58t42fL-K9tGR8_ekFgf6nh_OhWMxD0XsP52Wr-h1vNTYEvrvrMnwj7u88Bob9ksb9k5_5y0_di9L3RXB2ddLQ2lvpGOlpxB9fONo460wIVsAdlOrCONnwPVEhX9c6BoFJT6VwPtPG-c2H5lrDtTvqmL2eVaQcK31BKXOYI22bZapGHGHwDNFCAak5dY3olKFfO0M6CA7uHeHoLLddeVo6aOpziziLa3zGMi1Nd6cIJcTiB8YZWxlqovDpSE6kfm3M4-kEmkLbbT_Jh9JHu5B409Q33FB8zT4c1MuZrmEm5BQr_6rma0f8zL7AHi4uxQQAXCB1jOcEb7kqc4cMx9iVU3J7GRIc0s0f60siqiSG5zmgRjpbRpMJH5yBG1NRTRCNNhgoo-11Me0gD2num4WEZmWqAPn-lvPZgKfCqoSEfuBepsQfE0HGW1Mji81cEF77P_MEHfKt36qF3qM3e86oJxBG2raWKtbDO0mKdz0MnHe2xNSZyYISWUBsLlHedOo7n-Z89rA8BzugmFN2LxqpGs7W0SGXLdzAcQGXLHLFILb2MWj7iCbXbWIBE4c1CK9N2vQ8-KGEr3JY4ylhGvjCySsk6Z_mSsPVFbUTGUb0wsM8GdQ051t0UutROChgyqcG2xnmKZif1xXsHDmvSG2r2YF-s9HHJ8BpI0Hg2yD3FICkr8pE_h71PRAKxRGIpf6caWRC2nRhkayzvjRASGeBKHWNRe3scS2Sy-5joa6hto8_vhcaifl0PvpFWvKqHQadKuOJ2SsFIqgIuQtHGW3Ws6Xjzz0a1zkjx8By8fIAm7hyEL1WRwWq5EEkmSp7M-bxO-Gq5TuplmVULqGC1FvF06zs6LbL5QgCUCbCsSubz-TwpU8iScpUtRcWKQlT1d3e6SgUr8hUkNS-yZA7rdVLWWZrk2bKoM87LNGVnp2_93Yn7XKzzNb-D-2w5z1hR5Flx19yvSkgXdVmleZ5XbM3XWcVguYQ5F_mirpd38p6lrEjzdJGlaVYUM16Vy3TF03JRz5fr5ZrMU2i5VDPsfzNjd3dBcu-zPE3nxZ3iJSgXXqUyFl_xMVI83dn70C_LfufIPFXSeXe24KVX4fVrWFA80efpe8-huV2993z9uvOzAnPXW3X_ib4fCdjfs38HAAD__ymvc3U">