<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61790>61790</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[mlir] debug-only=greedy-rewriter is quadratic
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
silvasean
</td>
</tr>
</table>
<pre>
Running [this script](https://gist.github.com/silvasean/78e1498880a874049bf974d047d101d7) with varying values of N produces IR like this:
```
module {
func.func @foo(%arg0: i64) -> i64 {
%c0 = arith.constant 0 : i64
%0 = arith.addi %arg0, %arg0 : i64
%1 = arith.addi %0, %c0 : i64
%2 = arith.addi %1, %1 : i64
%3 = arith.addi %2, %c0 : i64
%4 = arith.addi %3, %3 : i64
return %4 : i64
}
}
```
Every second addi is eliminated by the canonicalizer.
Running a command line like the following reveals a quadratic behavior:
```
time -- python /tmp/quadratic.py | mlir-opt -canonicalize -debug-only=greedy-rewriter |& cat >/dev/null
```
```
N= 2000 - 0.57s
N= 4000 - 1.76s -- ~4x higher
N= 8000 - 6.17s -- ~4x higher
N=16000 - 24.63s -- ~4x higher
N=32000 - 98.52s -- ~4x higher
```
If I comment out [this line](https://github.com/llvm/llvm-project/blob/cbfeec668bb338cbc46107f568e867d53e242bd9/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp#L181) then the time is:
```
N= 32000 - 3.19
N= 64000 - 6.42 -- ~2x higher
N=128000 - 12.87 -- ~2x higher
```
I think one possible solution here would be to not print the `%foo =` since I suspect the quadratic behavior comes from numbering the SSA values -- if we want a stable identifier to refer to, we can print source locations or something. But just not printing the `%foo =` would be a good first step.
I ran into this during a test case reduction of pattern rewrites not converging. My test case had complex control flow and it was not easy to reduce manually, so as a first step I wanted to see what pattern was involved and which op. To do that I was going to grep the output of debug-only=greedy-rewriter, but the output was being produced so slowly that I could not even get a full dump out even after 10 minutes. After commenting out the line I mentioned above, the entire dump is produced in about 1 second. The module has about 50k ops.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVt2O27gOfhrlhohhy_8XuZhpmoMBzikO2u4DyBZtq5Ulr36SZi_22RdSnEymkxbYwcCx6Y_kR1L-JGatGBXijpTPpNxvmHeTNjsr5JFZZGrTaX7effZKCTUCKZ_dJCzY3ojFkXJPaDM5t1iSPxF6IPQwCuuSUbjJd0mvZ0IPt1CEHuoGs6JtmiZlTV2kRdsNbV3wtKh5lma8JrSFk3ATHJk5h4RHJj1a0AN8gsVo7nu08PIZpPiOEKiExOmepNdrla7_8XHW3EsEUj9fnmHwqk_CBUiRDloT2hBaMjOmJH8CURWBwZbkH8P9nd_6R2jZp0DyPTAj3JT0WlnHlINgvPj_jL-HM84FXPPRD9fbt77rlZbZA8-rW_-zEy3pA3i2wrN36PwBmv46ePEAnq_w_C3aoPNGwer0rjAg9X59vt28Hdrl-vGI5gwWe604xJTCAkoxC8UccujO4CaEnimtRM-k-AtNch_gumYZ9HqemeIghcLrykEYtJT6FBAGj8ikBQZ_esYNc6KHDid2FNq8rq-3JJ2YEbZbWM5u0qHag5sXQg-3CMlyBlJ_gFkKs9WLg-09Vdhy7Py41UqeSb4fDSI_bw2ejHBogiOhFfTMAck_EnrgeCT0oLyUv2nZT8ZPYWQ0TVPYQpqUtb0zFxdzltSVDWX8XfyASYwTmjtQcwFVSVb_EpRVFxAtkir_JSpfabRNUtKHqIcFvQzwEqeHyoH27qY_YZKP1edOeKQ8Xn-2i9HfsHeEHjqpO0IPfTcg9lXVdF2eN33XF1WW1kNZNdhUNS9zpAXteEvoIUwwxBHB76thyg7azJbQwx9OyPD7nzi-_zPn0KjPlyHujTiiSfplITT_b9ZkQVnchCouvrh87tXrweiuTcuTrL0zV8V1LAWNnaQPpkLX2WU0aepHqMf9DpKqvoNWCIu2VnQSwWrpndAKJjQIJ-0lhw7BaVDawWKEcrGkEIuWg9ZBKkiVghWqR3gB6-2C_QX0_gML80ULg9EzKD93aMInGbBfvjxd9X-7BTHACeEUxJaBdSxQExyVE4NAE-gYHOJNkKVTVIaVndXe9AhS9ywUYkEbsHrGUOyYwLN38M1b91rPlcG7km7VMxi15jAIYx1Yh0vyto-GKRDK6bhHAffmokQOrYOeWQSD3PexrXqA5bJyYP3-bWTSa3VEM0aK_zvfuU6Mh6YtEn8EkDNawiD1CYLGCQcndgmAzJ4vfQnbJsxMeSblObTHamBB7175w0vsLfLgYRHhNDF3IxZCCnXU8og8pjlNop9ALwl81cBDmczFEBZGHdunYTS4xDZq7xbvQqG_E73Aq_Pu3iOE6zCEW_d-HphbqU_yfE3Zx5HEeo-oYMSwPgYvJXA_L1E14gs2BGHNUpiF8g5tAk_RsspLSKLX7HGfeIFo1ipU3OkjBn7hbbAavEQX9pWZUAHnHWTrrpXA1wlhPYJMod_xdZl-B73YZMN3OW_zlm1wl1VNmrZFTbPNtBvyfOBNWmBe8rwc8qbOWNmypm_btK2yeiN2NKV5mtM2o2VTZElVp1lRpaxum2wo-poUKc5MyCRoX6LNuBHWetxVWd2mG8k6lDYe-ChVeIL4klAazn9mF_Wy86MlRSqFdfY1ihNOxpNilMRy_9t5hubcPveNN3L3r7U6EgsCG4n_EwAA__8MP0YN">