<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/144025>144025</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [MLIR] Inconsistent output when executing MLIR program with and without `-affine-loop-coalescing`

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          Lambor24

      </td>

    </tr>

</table>

<pre>

    My git version is [4903c11](https://github.com/llvm/llvm-project/commit/4903c11a7e144d63635b115d97936a7aecf7a2f6).

## Description:

I am experiencing an inconsistent result when executing the same MLIR program with and without the `-affine-loop-coalescing`.

## Steps to Reproduce:

### 1. **MLIR Program (test.mlir)**:

test.mlir:

```

module {

  func.func private @printMemrefI32(tensor<*xi32>)

  func.func @main() {

    %0 = "tosa.const"() <{values = dense<2> : tensor<2x2x3xi32>}> : () -> tensor<2x2x3xi32>

    %1 = "tosa.const"() <{values = dense<5> : tensor<2x1x2xi32>}> : () -> tensor<2x1x2xi32>

    %2 = "tosa.const"() <{values = dense<0> : tensor<1xi32>}> : () -> tensor<1xi32>

    %3 = "tosa.const"() <{values = dense<0> : tensor<1xi32>}> : () -> tensor<1xi32>

    %4 = tosa.matmul %1, %0, %2, %3 : (tensor<2x1x2xi32>, tensor<2x2x3xi32>, tensor<1xi32>, tensor<1xi32>) -> tensor<2x1x3xi32>

 %cast = tensor.cast %4 : tensor<2x1x3xi32> to tensor<*xi32>

    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()

    return

 }

}

```

### 2. **Command to Run Without `-affine-loop-coalescing`:**

```

/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \

/path/llvm-project/build/bin/mlir-opt -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops | \

/path/llvm-project/build/bin/mlir-opt -pass-pipeline="builtin.module(func.func(convert-affine-for-to-gpu{gpu-block-dims=1 gpu-thread-dims=0}))" | \

/path/llvm-project/build/bin/mlir-opt -lower-affine -gpu-lower-to-nvvm-pipeline | \

/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \

-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \

-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \

-shared-libs=/path/llvm-project/build/lib/libmlir_cuda_runtime.so.so

```

### 3. **Output Without `-affine-loop-coalescing`:**

```

[[[20,    20, 20]], 

 [[20,    20,    20]]]

```

### 4. **Command to Run With `-affine-loop-coalescing`:**

```

/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \

/path/llvm-project/build/bin/mlir-opt -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -affine-loop-coalescing | \

/path/llvm-project/build/bin/mlir-opt -pass-pipeline="builtin.module(func.func(convert-affine-for-to-gpu{gpu-block-dims=1 gpu-thread-dims=0}))" | \

/path/llvm-project/build/bin/mlir-opt -lower-affine -gpu-lower-to-nvvm-pipeline | \

/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \

-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \

-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \

-shared-libs=/path/llvm-project/build/lib/libmlir_cuda_runtime.so.so

```

### 5. **Output With `-affine-loop-coalescing`:**

```

[[[10,    10,    10]], 

 [[10,    10,    10]]]

```

I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused this result.

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsmF2PozYXxz-Nc3MEgmMI4SIXmeSJNNKunmp70cuVgUPiFmzkl8xMP31lINmZTjbdzrTaXqyEwJjDOX8f2z-whbXyoIjWLL9j-W4hvDtqs_4g-kobzBaVbp7WH5_gIB2cyFipFUgLLL_LyoTXacryHcPV0bnBMr5huGe4P0h39FVc657hvutO50s0GP0r1Y7hvtZ9L0NhdiMKSrOsWfIlz6s0zZuyKPlSFILqthDYLhmWMUs24UDOkMOObG3k4KRWIXCyuQfRAz0OZCSpWqoDCAVS1VpZaR0pB4as7xw8HEkBPVLtXbByRwIreoKPH-4_wWD0wYgeHqQ7glDNWNDejWZsmUSibaWiqNN6iGotOrIhFlsmL-X97Giw4DR8osHoxtc0qbxYBKM0BoYbhpsx9E9zaIYrR9bFfScNw3KyuLz95dHF3zKZj2TT68Z3BKy4Y8kGoPWqjsMJBiNPwhGwLBmMVO4j9Ybae45jNGW1YXzLcPMoOTL-vxD3pQOWJb2QiuGKYXkJAMAwT4DxHTBEp62IQ8IdQzxb8i0r7k6i82RHu4aUJca3IQwwvoFLeHzER34WUOzOz2dHUbi_bntRkr5JSX5FSfqI36zkme1FCb5JSfJKSfqNKtJXCvh3V5CNnsfwvXC978YuYrgdB818xfnKz56vJxa3X-n75w_S25XXeu75GGKY18K6SfVoFE_3U1M2198Mk_zaDJrSUIuuuzrp5mBTZ7xs-ItpOGmeMj77NOS8UaPiYhcAMJ2fYeAFZfBMma3u-4C0QCWv4JeZbDepNkJ9PF6xhuF-EO74Gu6Vl10TrgEX-8CqSA8OLuSCaBDWRoMcqJOKGN8xLMJLTqp4IhjD1QU9ITXaisjpqJNKdIdIiZ4ahtuX1SMrS4YFsGILLN--QWN0dilM-ABEWlFkj9pFlW9bMvL3SS1ebqOgMHyEokp71QgjyTJEiGqtTmTcWXJw-SXH9j0SXycPbybvrGSO32oT1BwGz4q7w-CjqtP1b1Eje8v4LoVQ5Y6GRHOuS8IAm5OL71He6Qcysw4ICuYapyN1Ch7mRr0xhvFKkYGIIHypICLlzFM0aKlcNH37Gd-dtGxm35E9CkNN1MnKjnn8i1CdrKZziPZ5ivbZO9nZ2Op_xGf973j1jQiOnewptjq2-hYu-BkX__du8O79mAh_luHAEfkAMBUwCf-O4fdxCyPLrhlNhckuv0m57AblfiDueyDuKxn_gb4f6PuPoi-_gr73cy89k-x54Qr6bthdQ989w6IHpR1YbwhkG5aooWBBqCeo_AGkgv7psqTVZjaCB6PVAbwVBwI9VYlKnwjC_CILtfCWGnBHaedFc7xo1rwpeSkWtE6LrCzKVbrKFsd1URRVXiCmaZpxapYtr9o2qUqeVskqX-ULucYE82SZ8qRIeJLFVLZZLsoyXS3TLG1yliXUC9nFoUdjbQ4Laa2ndZplCeaLTlTU2XGDAnFaEyPLdwuzHkdA5Q-WZUknrbNfPDjpunFTI6ysWb6D--cbAXrq3j9tBNxe_98cBgtvuvXf3gQZm2kZ7ueWntb4RwAAAP__RX9bEQ">