<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/140543>140543</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [MLIR] Inconsistent output when executing MLIR program with and without `-test-loop-permutation="permutation-map=1,2,0"`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            mlir
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Lambor24
      </td>
    </tr>
</table>

<pre>
    My git version is [992458d](https://github.com/llvm/llvm-project/commit/992458d26bbd2b8744408dbb4ab8d6b6058301d6).

## Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without the `-test-loop-permutation="permutation-map=1,2,0"`.

## Steps to Reproduce:

### 1. **MLIR Program (test.mlir)**:

test.mlir:

```
module {
  func.func private @printMemrefI32(tensor<*xi32>)
  func.func private @printMemrefF32(tensor<*xf32>)
  func.func @main() {
    %0 = "tosa.const"() <{values = dense<-6.978000e+01> : tensor<4x3x6x7xf32>}> : () -> tensor<4x3x6x7xf32>
    %1 = "tosa.const"() <{values = dense<9.72999954> : tensor<2x3x1x7xf32>}> : () -> tensor<2x3x1x7xf32>
    %2 = "tosa.const"() <{values = dense<7.650000e+01> : tensor<2xf32>}> : () -> tensor<2xf32>
    %3 = "tosa.const"() <{values = dense<5.610000e+00> : tensor<3x5x2x2xf32>}> : () -> tensor<3x5x2x2xf32>
    %4 = "tosa.const"() <{values = dense<-1.892300e+02> : tensor<4xf32>}> : () -> tensor<4xf32>
 %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
    %6 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
    %7 = tosa.conv2d %0, %1, %2, %5, %6 {acc_type = f32, dilation = array<i64: 2, 1>, pad = array<i64: 2, 2, 2, 2>, stride = array<i64: 1, 1>} : (tensor<4x3x6x7xf32>, tensor<2x3x1x7xf32>, tensor<2xf32>, tensor<1xf32>, tensor<1xf32>) -> tensor<4x3x10x2xf32>
    %8 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
    %9 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
    %10 = tosa.depthwise_conv2d %7, %3, %4, %8, %9 {acc_type = f32, dilation = array<i64: 1, 2>, pad = array<i64: 1, 2, 1, 2>, stride = array<i64: 1, 2>} : (tensor<4x3x10x2xf32>, tensor<3x5x2x2xf32>, tensor<4xf32>, tensor<1xf32>, tensor<1xf32>) -> tensor<4x4x3x4xf32>
    %11 = tosa.abs %10 : (tensor<4x4x3x4xf32>) -> tensor<4x4x3x4xf32>
    %12 = tosa.argmax %11 {axis = 2 : i32} : (tensor<4x4x3x4xf32>) -> tensor<4x4x4xi32>
    %cast = tensor.cast %12 : tensor<4x4x4xi32> to tensor<*xi32>
    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
    return
  }
}
```

### 2. **Command to Run Without `-test-loop-permutation="permutation-map=1,2,0"`:**

```
/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \
/path/llvm-project/build/bin/mlir-opt -tosa-to-tensor -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -lower-affine -convert-scf-to-cf -expand-strided-metadata -convert-cf-to-llvm -convert-arith-to-llvm -convert-math-to-libm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | \
/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \
-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_async_runtime.so
```

### 3. **Output Without `-test-loop-permutation="permutation-map=1,2,0"`:**

```
[[[0,    0, 0,    0], 
  [0,    0,    0,    0], 
  [0,    0,    0,    0], 
  [0,    0, 0,    0]], 
 [[0,    0,    0,    0], 
  [0,    0,    0,    0], 
  [0, 0,    0,    0], 
  [0,    0,    0,    0]], 
 [[0,    0,    0,    0], 
 [0,    0,    0,    0], 
  [0,    0,    0,    0], 
  [0,    0,    0,    0]], 
 [[0,    0,    0,    0], 
  [0,    0,    0,    0], 
  [0,    0,    0, 0], 
  [0,    0,    0,    0]]]
```

### 4. **Command to Run With `-test-loop-permutation="permutation-map=1,2,0"`:**

```
/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \
/path/llvm-project/build/bin/mlir-opt -tosa-to-tensor -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -test-loop-permutation="permutation-map=1,2,0" -lower-affine -convert-scf-to-cf -expand-strided-metadata -convert-cf-to-llvm -convert-arith-to-llvm -convert-math-to-libm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | \
/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \
-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_async_runtime.so
```

### 5. **Output With `-test-loop-permutation="permutation-map=1,2,0"`:**

```
[[[0,    0, 0,    0], 
  [2,    2,    2,    2], 
  [0,    0,    0,    0], 
  [0,    0, 0,    0]], 
 [[0,    0,    0,    0], 
  [2,    2,    2,    2], 
  [0, 0,    0,    0], 
  [0,    0,    0,    0]], 
 [[0,    0,    0,    0], 
 [2,    2,    2,    2], 
  [0,    0,    0,    0], 
  [0,    0,    0,    0]], 
 [[0,    0,    0,    0], 
  [2,    2,    2,    2], 
  [0,    0,    0, 0], 
  [0,    0,    0,    0]]]
```

I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused this result.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWU1v47wR_jX0hZBAjb4PPjj2GyDAu2ixPfS4oKSxzUIiBZJKnP76gpT8FTt5s062TYEFBNEaP5p5OBzOjEBujNhIxDlJ70i6mvHBbpWe_8m7SmlIZpVqnuffnulGWPqI2gglqTCUpHdlCUlaNCRdESi21vaGxAsC9wTuN8JuhyqsVUfgvm0f90PQa_UvrC2B-1p1nXA_JjWQVVUDVZEnScKKpqoSXhVNVmUsLWIWNRmBMiRs4S6ICcR0habWordCSWeYLR4o7yjuetQCZS3khnJJhayVNMJYlJZqNENr6dMWJcUd1oN1KLtFaniH9NufD99pr9VG844-CbulXDb-hxqsh5GMBRaNDVql-qBH3Q2WjwxWBOBEEHS8J_EqIrAEAktGAEjGzmfwD4u9oVbR79hr1Qw1jhM5IBwoCimBBYGFZ_f3iR2BwvEIu1ZoAuWIOLx9_OugL2PTxRadaoYWKcnvCFtQuh5kHbob7bV45BYpSVivhbTfsNO4fojBW5NGaRIvCSx2IgYS_-HsvkfB_aWC9SsKSMI6LiSBgkB5YEgpgZRREq8oAbDK8NAtqnU-nZDxkuR3j7wd0Hhcg9IgiZdBFpZ5wRhDAncsIvEflMQLeiCT7OJdtsv3hPLVHjEpDtzza-gDt-gmbmWYQ1mWZZpc0IJdvIveTesF-kALbqKVh1nKXvcYvJvUSzrxTXTSMIsOdNgFnXiX7mD3XlIv0AdqyW3BFYVFCfHEDa4E13vD6siHQJreRIaFjL3hp-idXKIL32Rfi07uzey5PELjkwOBpd-I0wjTmE5j5nIJr-sf9rlHr8CphSVtROvTtZdxrfkziZciSxwhD4h8nlrSnjevYk5vI9pYLRq89kJ0UJqv9tN-Lb_A8tU9fv7XFWH0tvBaZovYlZ1RfK3VL78WnYgdo7HB3m6fhMEfx7jMp_iLpzGZxmIay1viMjqNtFfiMjqEZPQTcQlvxOVpdJzG1IuUevpX8imB6Ywnl56Pjp7nlTksxkviZ2-_XzucaNebju_2NvM7vhNjdIE353qhqy77a8vJvpE62K25saNljwvH54nP4vrbroG81p2NSmvetlcbusnauG_OuZ-1eCPvcT9MOjXaQUv_4LYMW0z3kx7zrIWFfQu7VF3nWmrX8g6S_nPqrD_aVfvvDn9d9LoE7ntut5ffH9Ug2saNrtu8d71yoHpLD50zDXpuTNCLHlsh0TPJ3UtWyHDsoAkUh87VuU8ZHlgVtELydhNI3mFDYHku9r16SSCnJF9Ski5v4BjsVY4Ldnzm2n2zBEpiYLbKBtWwXqMW_x7Zw-ExcIy9Oys1yIZrgYYA0MAlLtR2PwWncr0WEv3CGBq06gn1JDuiTb120HpNA9z1XDbBmGaaoEPLG275ETtC3TyPMk_7UtzxSSqqjgZrR8lx73wMX8LdnI5SjbWStWgxGKRG_2YTuHg3NzpeD1KipgFS94FCA5RWPwe9EtIG42cliVePSjST7sBsucbGsTfe_X9hqhXVeHfWfozWfgxWtCY06lN01r9EKzfP0mu2osPQqLfyQLzPA38bbD_Y_8r-T-_Gy3eIlFI_Hh_Sla_EPpW9AJ2Nn4A7Ax1xV-h9luUPKbuF4a924Wdw_LDxn2WYvlkbkzdq4-_C-P9SGG9epN819XdN_VBNTa_U1K9TUGGSX45fpaD-FMP_RUH9pS78rIL6IZKfUVAfCOQdlcpSM2ikYk3tFt0PQ7l8ptWwoULS7vlwvKP0BKJPWskNHQzfIFWjiFfqEamrc2hozQeDDbVbYaYDpHDWzOOmjEs-w3mUJ3mWxRGD2XaOdVOnTVGyOK8LZFmS5kUSrbOsqut1nuBMzIFBytKojOK0AAg5T4qUrYuy5lmGaUoShh0XbegyRaj0ZiaMGXAeJSxN4lnLK2yNP6wDGA9_nI9neu4zSzVsDElYK4w1Rw1W2NYf8H378-E7SVf04fRQTI2J48Wh2NtnYR9NMLNBt_OfPjP0njAE7idnPM7hPwEAAP__bjwDLA">