<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/119999>119999</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize`

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          Emilyaxe

      </td>

    </tr>

</table>

<pre>

    git version: ff939b06a5

system: `Ubuntu 18.04.6 LTS`

## Description:

I am experiencing an inconsistent result when executing the same MLIR program with and without `affine-parallelize` and `--affine-super-vectorize`.

The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.

## Steps to Reproduce:

### 1. **MLIR Program (tosa.mlir)**:

tosa.mlir: 

``` 

module {

  func.func private @printMemrefI32(tensor<*xi32>)

  func.func private @printMemrefF32(tensor<*xf32>)

  func.func @main() {

    %0 = "tosa.const"() <{value = dense<[0, 2, 1]> : tensor<3xi32>}> : () -> tensor<3xi32>

    %1 = "tosa.const"() <{value = dense<-12> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>

    %2 = "tosa.const"() <{value = dense<1676> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>

 %3 = "tosa.const"() <{value = dense<-10> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>

    %4 = tosa.abs %2 : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>

    %5 = tosa.clamp %4 {max_fp = 1.600000e+01 : f32, max_int = 16 : i64, min_fp = 0.000000e+00 : f32, min_int = 0 : i64} : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>

    %6 = tosa.arithmetic_right_shift %2, %5 {round = true} : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>

    %7 = tosa.minimum %6, %1 : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>

    %8 = tosa.transpose %3, %0 : (tensor<1x4x21xi32>, tensor<3xi32>) -> tensor<1x21x4xi32>

    %9 = tosa.matmul %7, %8 : (tensor<1x4x21xi32>, tensor<1x21x4xi32>) -> tensor<1x4x4xi32>

    %cast = tensor.cast %9 : tensor<1x4x4xi32> to tensor<*xi32>

    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()

    return

  }

}

``` 

 ### 2. **Command to Run without  `affine-parallelize` and `--affine-super-vectorize` :**

``` 

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops   -tosa-to-arith  -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm     -convert-linalg-to-affine-loops -convert-vector-to-scf    -convert-arith-to-llvm    --affine-loop-coalescing -convert-vector-to-scf      -convert-vector-to-llvm     -convert-math-to-llvm -convert-arith-to-llvm       -lower-affine     -convert-scf-to-cf -finalize-memref-to-llvm  -convert-func-to-llvm  -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 3. **Output  without   `affine-parallelize` and `--affine-super-vectorize` :**:

``` 

[[[2520,    2520,    2520, 2520],

  [2520,    2520,    2520,    2520],

  [2520,    2520,    2520, 2520],

  [2520,    2520,    2520,    2520]]]

``` 

### 4. **Command to Run with  `affine-parallelize` and `--affine-super-vectorize`  :**

``` 

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops   -tosa-to-arith  -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm     -convert-linalg-to-affine-loops --affine-parallelize    -convert-vector-to-scf    -convert-arith-to-llvm --affine-loop-coalescing  -convert-vector-to-scf --affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0 vectorize-reductions=true"      -convert-vector-to-llvm -convert-math-to-llvm       -convert-arith-to-llvm       -lower-affine -convert-scf-to-cf   -finalize-memref-to-llvm  -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 5. **Output with  `affine-parallelize` and `--affine-super-vectorize` :**

``` 

[[[120,    120,    120,    120],

  [120,    120,    120, 120],

  [120,    120,    120,    120],

  [120,    120,    120, 120]]]

``` 

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWVuP4joS_jXmpWSUOOGSBx7oZpBaOke7Omf2uWVChXjl2JEvdPf8-pWdEJoGevo2-3IGodxcl6_KRX124NaKnUJckMkNmaxG3Ltam8W3Rsgn_oijjd4-LXbCwR6NFVqRbAlVVWTFJpnyCUmWJFnaJ-uwCSNkmvxn45XzkM7HST6ewh_f_ybTpJMjLCMsgxXa0ojWddZIsrwD3gA-tmgEqlKoHXAFQpVaWWEdKgcGrZcOHmpUgI9YehekXI1geYPw5x93f0Fr9M7wBh6Eq4GrbbzQ3gVQvKqEQtpyw6VEKX4gmSZRiEwTSvth61s0dI-l06YTGZNk-b1G0N613sEGS92ghVIbg-UBj3A1GtBVwGMR3IMGHaOzICwYbPQet4TdgtVwR9isAa-sNwgPtSjrKNuIHzxoQKmV40LZGNvG78Z95obk_e2wteA0_IWt0VtfYpfD50JBLh0DYUvCljE3_-5zQ9jcacvHjRSGsKKT6Awcn4d5TJZh1uI33DR66yUCmd2QZAlQeVWOwwFaI_bcIZA8aY1Q7k9sDFZ3GQueUFltSHZL2PJRZIxk34LPtxhYnxuorhggedJwoQibE1YMCAEImyRAshUQxmJsoZwcYewgmd2S2c2eS49RbIvKYng4uUnCZLFwSMlkRbJvEHIyoMkOwcyGsd4mDffncgOg9COAaMrOIKSP-SNL34jjVHgAwz4CJp3Opl-MhrBJ9rG8JL8kL3n0EpHwjT1kKtq4ovpWy5Oj5VLypu29zW4a_nhftXE0HU-T8EHCbpI0Og6Vz24hCAnlOqlpHBHTPI4IdVBPxslRPTlRF2pQTwbt2eqLYps-y5oRrm7QifLeiF3t7m0tKhcTGYB0iZjdGO1D_w1axuMbkNxeA_FWjLMjxkYo0fgmIu9Bpb_c__zo3xmubKstxvLvESTvQJC94pwFtZfOi2fBc9d4GTPSe56_K_bn9i_Gfu695LYrvU5u3N13qJaXlQPJXaKQzmbJpbzIOr2zrnOcxnTCQx3srjH0Ng06b1S8Cb0jWfbHF2TYNa0D0bID0d7qpgkrisDNXg2rj88sP0IEPUdfgMHWW-44YWv744mwdSB6wtZS7htqUCK3eLhtjf4vlo6w9cYLuQ3nwJjrQPdUtw4G8gfacmtpK1qUQiHJVoSxoOSEGnerAMLmA_v26wnqNJVCcbmjijdxsXP6OK41CsIYkFkouK9DTg-Od6jQ8JDbDgPVrQUAegASuxIALbXao3G04a6OAOW-AUodWncwhRIbVO5BWKSVD6te2nLn0CjbJaTyFjuHogx-glFntAwBUh2mstaObnxVYZzJPov9LQ2ZC6s9ugktkBuBNmoeoEWoA7bwGYZ6hCGermikDnEO4131hHFbVieaZ0bpcxO01FyijYvv68bg0tg5yNPUXo8KgEr9gKYHcmrEllUQLiugVQg6JK6Jv_OjjUE4ZPTZY4OlVqWQSL0yGHW3NHQFG-vPiQbDTzNNvqwUy9ZT45VCAxQhLEiBonLmibZaKEe7DQzJVnsttkCprbnBLZVi05XUJ1BIsemOAch9ed_BuPdOSDu2-lc6-z-64vZJxdjC5I2tvtyXj205O7Tlf3Vbt2M__qKGnF3syWETHb5swuIuAgAuXMbzZEXYbUc2P5M_XL5D5RMuuu9P8pu_QnufS_EZ6f3mvd-8d4336HmZXaGp1_nwKhleM3W1gmPMe2Gc5_KgY7uBlM0h5rviNp733DwJtSPZKoHBAjW49TFPIetxb8TY6-x7eXJf6Pycfy9wL7ybfX-T7z-DfK8yw-QF836eEV7dBR0YNx147NrVCRu-Iv4e2Y-ZHhj2eSSj7SLbFlnBR7hIZ1nO8vmUzUf1YsqLnKfJZoPlfFLlVZFnBc_zgk-raVqm6UgsWMLylKWTZJZn-Xy8mRTTguesnDHG2DwneYINF3IcZnuszW4krPW4SNOiKIqR5BuUNv4DwFj3apiRyWpkFrE6Nn5nSZ5IYZ09WnDCyfivQVSYrODu-Qv7_oX5ixf25-_pP1EWI2_konautbE-1oStd8LVfjMuddNX9nmBx8AtYes-9v2C_S8AAP__1J2g2Q">