<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/94520>94520</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [mlir] Incorrect region simplification in `applyPatternsAndFoldGreedily`

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          bmhowe23

      </td>

    </tr>

</table>

<pre>

    ## Summary

Region simplification appears to be performing incorrect simplification under some nested loop cases, causing erroneous code generation. I have seen this incorrect behavior in 16.0.6 and 19.0.0 (on Ubuntu 22.04, x86_64). I have not checked any other versions.

## Details

The following MLIR causes incorrect region simplification in any pass that calls `applyPatternsAndFoldGreedily()` with `config.enableRegionSimplification` set to true (which is the default parameter value).

**nested_loop_bug.mlir**:

```mlir

module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.triple = "x86_64-unknown-linux-gnu"} {

  func.func @nested_loop() -> i32 attributes {no_this} {

    %c3_i64 = arith.constant 3 : i64

 %c2_i64 = arith.constant 2 : i64

    %c0_i64 = arith.constant 0 : i64

 %c1_i64 = arith.constant 1 : i64

    %c1_i32 = arith.constant 1 : i32

    %c0_i32 = arith.constant 0 : i32

    cf.br ^bb1(%c0_i32, %c0_i64 : i32, i64)

  ^bb1(%0: i32, %1: i64):  // 2 preds: ^bb0, ^bb8

 %2 = arith.cmpi ult, %1, %c2_i64 : i64

    cf.cond_br %2, ^bb2(%0, %1 : i32, i64), ^bb9(%0, %1 : i32, i64)

  ^bb2(%3: i32, %4: i64):  // pred: ^bb1

    %5 = arith.addi %4, %c1_i64 : i64

    cf.br ^bb3(%3, %5 : i32, i64)

  ^bb3(%6: i32, %7: i64):  // 2 preds: ^bb2, ^bb5

    %8 = arith.cmpi ult, %7, %c3_i64 : i64

    cf.cond_br %8, ^bb4(%6, %7 : i32, i64), ^bb6(%6, %7 : i32, i64)

  ^bb4(%9: i32, %10: i64):  // pred: ^bb3

    %11 = arith.addi %9, %c1_i32 : i32

    cf.br ^bb5(%11, %10 : i32, i64)

  ^bb5(%12: i32, %13: i64):  // pred: ^bb4

    %14 = arith.addi %13, %c1_i64 : i64

    cf.br ^bb3(%12, %14 : i32, i64)

 ^bb6(%15: i32, %16: i64):  // pred: ^bb3

    cf.br ^bb7

  ^bb7:  // pred: ^bb6

    cf.br ^bb8(%15, %4 : i32, i64)

  ^bb8(%17: i32, %18: i64):  // pred: ^bb7

    %19 = arith.addi %18, %c1_i64 : i64

    cf.br ^bb1(%17, %19 : i32, i64)

  ^bb9(%20: i32, %21: i64):  // pred: ^bb1

 cf.br ^bb10

  ^bb10:  // pred: ^bb9

    return %20 : i32

 }

}

```

The above MLIR roughly corresponds to this C++ code (after running through some of our custom passes which are not relevant for the purposes of this issue:

```cpp

int nested_loop() {

  int nEle = 0;

  for (std::size_t r = 0; r < 2; r++)          // r = 0,1

    for (std::size_t s = r + 1; s < 3; s++)    // s = 1,2 then 2

      nEle += 1; // should be incremented 3 times

  return nEle;

}

```

The MLIR code can be executed without any passes using small test driver program, and it produces the correct result (3).

**nested_loop_driver.c**:

```cpp

#include <stdio.h>

int nested_loop();

int main(int argc, char *argv[]) {

 printf("nested_loop: %d\n", nested_loop());

  return 0;

}

```

### Results when using no additional passes

```bash

$ cat nested_loop_bug.mlir | mlir-opt-19 --convert-func-to-llvm --convert-cf-to-llvm | mlir-translate-19 --mlir-to-llvmir | llc-19 --relocation-model=pic --filetype=obj -o nested_loop_good.o

$ clang-19 nested_loop_driver.c nested_loop_good.o -o nested_loop_good.exe

$ ./nested_loop_good.exe

nested_loop returns: 3

```

In order to run `applyPatternsAndFoldGreedily` from the command line, we can simply use `--test-math-algebraic-simplification`. The problem is not unique to that pass, nor even really related to that pass because this MLIR will not match any patterns used by that pass. I'm merely using it to show how `applyPatternsAndFoldGreedily` is behaving.

### Results when using `applyPatternsAndFoldGreedily`

```bash

$ cat nested_loop_bug.mlir | mlir-opt-19 --test-math-algebraic-simplification --convert-func-to-llvm --convert-cf-to-llvm | mlir-translate-19 --mlir-to-llvmir | llc-19 --relocation-model=pic --filetype=obj -o nested_loop_bad.o

$ clang-19 nested_loop_driver.c nested_loop_bad.o -o nested_loop_bad.exe

$ ./nested_loop_bad.exe

nested_loop returns: 2

```

**The fact that is now returning 2 is incorrect operation.**

The following MLIR snippet shows the output of applyPatternsAndFoldGreedily (via --test-math-algebraic-simplification). 

```mlir

$ cat nested_loop_bug.mlir | mlir-opt-19 --test-math-algebraic-simplification

module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.triple = "x86_64-unknown-linux-gnu"} {

  func.func @nested_loop() -> i32 attributes {no_this} {

    %c3_i64 = arith.constant 3 : i64

 %c2_i64 = arith.constant 2 : i64

    %c0_i64 = arith.constant 0 : i64

 %c1_i64 = arith.constant 1 : i64

    %c1_i32 = arith.constant 1 : i32

    %c0_i32 = arith.constant 0 : i32

    cf.br ^bb1(%c0_i32, %c0_i64 : i32, i64)

  ^bb1(%0: i32, %1: i64):  // 2 preds: ^bb0, ^bb7

 %2 = arith.cmpi ult, %1, %c2_i64 : i64

    cf.cond_br %2, ^bb2(%0, %1 : i32, i64), ^bb8(%0 : i32)

  ^bb2(%3: i32, %4: i64):  // 2 preds: ^bb1, ^bb4

    %5 = arith.addi %4, %c1_i64 : i64

    cf.br ^bb3(%3, %5 : i32, i64)

  ^bb3(%6: i32, %7: i64):  // pred: ^bb2

    %8 = arith.cmpi ult, %7, %c3_i64 : i64

    cf.cond_br %8, ^bb4(%6, %7 : i32, i64), ^bb5(%6 : i32)

  ^bb4(%9: i32, %10: i64):  // pred: ^bb3

    %11 = arith.addi %9, %c1_i32 : i32

    cf.br ^bb2(%11, %10 : i32, i64)

  ^bb5(%12: i32):  // pred: ^bb3

    cf.br ^bb6

  ^bb6:  // pred: ^bb5

    cf.br ^bb7(%12, %4 : i32, i64)

  ^bb7(%13: i32, %14: i64):  // pred: ^bb6

    %15 = arith.addi %14, %c1_i64 : i64

    cf.br ^bb1(%13, %15 : i32, i64)

 ^bb8(%16: i32):  // pred: ^bb1

    cf.br ^bb9

  ^bb9:  // pred: ^bb8

    return %16 : i32

  }

}

```

**Diff** (original on the left, applyPatternsAndFoldGreedily on the right)

![image](https://github.com/llvm/llvm-project/assets/141149032/70cbec3b-4c65-4d86-b0a1-062ffe0f8940)

It appears that `applyPatternsAndFoldGreedily` incorrectly merged `^bb5` and `^bb6` and correspondingly eliminated the `%14 = arith.addi %13, %c1_i64 : i64`. As demonstrated with the test program, this results in incorrect behavior.

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsGl1v47jx1zAvhAyJsmT5IQ_52BQLtEBx2z4HlDSyeEeRKkkl6_76YkjJlrx27PTuYVvcweelqZnhfHNmFG6t2CmAe5I9kuz5jg-u1ea-7Fr9Diy9K3W9vycsJSyl34au42ZP4mcSP4TvX2AntKJWdL0Ujai4w5-874EbS52mJdAeTKNNJ9SOClVpY6BypxiDqsFQqzugCqyDmkqte1pxC5awJ1rxwSIBMEYr0IOlla6B7kCB8RRW9Ctt-RtQC6Coa4WdHVZCy9-ENlQomuSreJVTrmqabFfxKqaEFVrRf5aDcgNlbBWv8cTvRf6arwnbHigr7WjVQvUb1JSrPdWuBUPfwFihlV3N1TIq7BkcF9LOn_yjBdpoKfU7ivO3v379xcsGc3bNWaUK5U_tubXUtdzRiktpKclj3vdy_3fuHBhlH1T9omX9FwNQC7knrCBsS_KYvgvXInSlVSN2K1C8lBDs921xEgJbcGg9ZwZA_by3omqpwIOB1tDwQTrac8M7cKgDLgdAVS11gJ9gzVe05ms57FadFCY8IekEmMfh45_5rU7XgwTKnTOiHBxYSjaPUr51q5o7_ir5Xg-OkvSZEsYg6kj6AFHPNjFJH1Lmv_BnsvyJq3ztvyIxLZoCkRJWRKrART7h5OvoW4LaY-gO_nBnRC9hOjd4SDSo35R-V5EUavge7dSAGJtn5DgIQ2kzqGqFX5Ss45lGgnFoRNIvVKTsRF6lX9GPT2hRSlhWpa8iX3tGuBGuXVVaWceVoykl6QNF6QI8ArNLwOwEeCQeX4KPzxBPLgEn54knryjpZfiU_cDMefj4R_iqWZWGkuxLWSZetyM6GnAuWEBjT547tp0IzBHjGRRhWTLJwra4ooS9EPZCGe0N1Ba3PHLswXFVHFW04L7rBR2kO5AdOWNHzuYKqxqUt35FqVjGDtTZxORI5oxME-j2Ouhc_JFyuhR_fV58FP4ge7KwWzYTmte1CFRGaZOL0k7mSyc2AkZ2hesRPF9yvbnNaEe1ZgsRist220ySpLfYrTgcsJ74HMl8YLf8OuhcAyPl7YnbxjcYLl1InSRnLLedWS5lHwZeFhhJkgMLVzifENgJ6-kNrC9zS7I-w3qSftrrkgMPl5PF3EZJdsJ6_jmtzzjYLFSzuYidn8UuDvyMYXtF9RP85oT_4gb-N0vVb8-pvrhd9cmBlZGH7RXex7zGThI1u5Cpz6aq-fnx8haILyJvj_wbcINR_tjT24hsnsfq5rCYypzTcpCX-g1CKWj0sGvlnvpK0PZa1b6I9uXsE2GPhD2GwpewgjdYeplBKawkXetxQwmtG6oHQ6vBOt35ihEsDTUcN6GQNSDhDa_RRhtf1fWD6TXC6WYsn60d4Mcirer7sCOUoz_WMrNKxQN8GeulmKTHckhjWiysQ6WS9MGKf8Oro-YA6JdPlPllEBtpH_4bzTIhsKfZ3XOBuPWw-OiRJkjX-iNSv1wcMRIPCJjFGOpH0Vm6o6Nc7DHApI8HrFYPssauR6jKQAcKe5mUOtGBnQiMboM0Dlq57iahV0DjV1zhCfAdqgHJY2WP9fDUHoCloVuyHZeSOrCO1ka8gaG90TvDOwwW7ICEw516qCCU9scOxGKFT1iRXi_rA-lVdamsP3gMYalQlRzQf9Mn62qhVy1Jv3zgTwf94NOOC0VYgUtudpXvC1uONn3gZvcWOtilE_ZGKNd4WmxOPPWVWU2yJzUW-D8ePTv9YLP4RoOFBhB7wF-8KjH8QI1mUZpiisRei8vRYie0Sm7bidKaVnyhmkMjRcnmieIi0r2Lki2NokqrNzAuwm4jcjrCvmW2XTWHzQOuM1xZyR0ECmEvAI1HSFmFZwakDk1i1OkaJEmfe1HRKGqEBLfvgaTPuvyVRnrB707reqVn8kiudkjxnBudwTxLD77DkeKKsJfLELMnoyF99Zd-YMCvimpTg8H0awZ1tc3OY9oY3Y1R1HUYXVIoQNd6DyHr-_k9HSwgtSjCuIw67tqIyx2Uhosqsqed-Ipi7PdGlxI67MAxeQ9K_GuAcDNw5z3Iu7A2FN5AUQNcyj3meI75YQ5HS_DzhpDjfU55F1J6sh13eEP4LBKkRGZrWu6P-Cv6lbBNRzsw4IXxYx0_LLCtfqf4_w26EnYcyqjdmcHJhbi5TvePiqLrpvlZQ63k_2WkecRz1D6MswXAhTBjH-ZJ_PihGK9ccDPv5O8jBbQ7o4tpnu6nmd9443w8XrNK9D04757hltOD6weHhc5H7oS335vgN3mDHxNenGb94c7355DszyHZ_8OQbPOzDMnGRng2qvgdU7FTeZPZ-Od_ZES2aHjZTzQVG4dF-SVD_TRjMPb7x2CfnR3lC3r5RezswuRpOfu6lgIm-JOYSG4ZFedLVZ8LhOQTkZAcWBl5uBwLi7FXfoOqk7Mnno6iLmEX52ZFSX7qOTcOi0K58yyaJqz8m0sjdgLbSK18bSOh8eH4YWUzwhqxa91BN4QlJHsUHd-Bb6OL1rkes1gQbCdcO5SrSneEveBFP_4T9Ub_CpUj7AUbWWcJe0nWSbLexqjZl01clVClZbSu8ixa10UelTFPojhnTQNxU2zX8ZGH0Hu54xtkLAhv6Sem6lDusTXZQY1Yo7vnsR92TBv5tHGcsgm1k3sKUnRChZap9U3aZ4fK2K89WFpDhxew4dN0xtPzc5jZAMZ3YGZsc4Q687560Rrd1fdpvU23_A7uk02yyZO02CR37f0mqZOSbZuqTqo0Ljebqo6TnNdpXGbrOmN34p7FbB3ncZYU2SZLVtDAOi-yutwU1bpscrKOoeNCrnwBp83uzg__7rfrjMV3kpcgrf8DAcbC-1tGsuc7c-_tXw47S9axFNbZIwEnnPR_VOARsmf69frr7et2vhuMvP-0X3ph0C-DPG_37D8BAAD__89iN6w">