<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/138595>138595</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[MLIR] convert-linalg-to-parallel-loops doesn't seem to parallelize reductions
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
brian-kelley
</td>
</tr>
</table>
<pre>
I have a project that uses MLIR to lower linalg/tensor code to scf/memref. Ideally the resulting code exposes all possible parallelism from the input operations. One of the passes I'm using is ``--convert-linalg-to-parallel-loops``, but it falls back to a sequential ``scf.for`` loop for reduction dimensions, instead of ``scf.parallel/scf.reduce``.
Below are a couple of simple examples (dot and matmul) that turn the reduction dimension into ``scf.for`` loops. To replicate, run them through ``mlir-opt --convert-linalg-to-parallel-loops``.
Is there another way to generate parallel loops from linalg ops that will keep reductions parallel? One alternative I tried is ``--convert-linalg-to-affine-loops --affine-parallelize``, but this also parallelizes over the non-reduction dimensions only (and it seems like the affine dialect can't express a parallel reduction anyway).
dot.mlir:
```
module {
func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
%alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
linalg.dot ins(%arg0, %arg1 : memref<?xf64>, memref<?xf64>) outs(%alloc : memref<f64>)
%0 = memref.load %alloc[] : memref<f64>
return %0 : f64
}
}
```
result:
```
module {
func.func @dot(%arg0: memref<?xf64>, %arg1: memref<?xf64>) -> f64 {
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%alloc = memref.alloc() {alignment = 64 : i64} : memref<f64>
%dim = memref.dim %arg0, %c0 : memref<?xf64>
scf.for %arg2 = %c0 to %dim step %c1 {
%1 = memref.load %arg0[%arg2] : memref<?xf64>
%2 = memref.load %arg1[%arg2] : memref<?xf64>
%3 = memref.load %alloc[] : memref<f64>
%4 = arith.mulf %1, %2 : f64
%5 = arith.addf %3, %4 : f64
memref.store %5, %alloc[] : memref<f64>
}
%0 = memref.load %alloc[] : memref<f64>
return %0 : f64
}
}
```
matmul.mlir:
```
module {
func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
%alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
linalg.matmul ins(%arg0, %arg1 : memref<64x64xf64>, memref<64x64xf64>) outs(%alloc : memref<64x64xf64>)
return %alloc : memref<64x64xf64>
}
}
```
result:
```
module {
func.func @dot(%arg0: memref<64x64xf64>, %arg1: memref<64x64xf64>) -> memref<64x64xf64> {
%c1 = arith.constant 1 : index
%c64 = arith.constant 64 : index
%c0 = arith.constant 0 : index
%alloc = memref.alloc() {alignment = 64 : i64} : memref<64x64xf64>
scf.parallel (%arg2, %arg3) = (%c0, %c0) to (%c64, %c64) step (%c1, %c1) {
scf.for %arg4 = %c0 to %c64 step %c1 {
%0 = memref.load %arg0[%arg2, %arg4] : memref<64x64xf64>
%1 = memref.load %arg1[%arg4, %arg3] : memref<64x64xf64>
%2 = memref.load %alloc[%arg2, %arg3] : memref<64x64xf64>
%3 = arith.mulf %0, %1 : f64
%4 = arith.addf %2, %3 : f64
memref.store %4, %alloc[%arg2, %arg3] : memref<64x64xf64>
}
scf.reduce
}
return %alloc : memref<64x64xf64>
}
}
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzEWF-P2jgQ_zTmZZQocUKABx7YbpGQWp1U3Rdwkgn41rFztrO73Kc_2U4gLNDdVtfeCyRk5jf_f4PDjOF7ibgm8wcyf5yx3h6UXpeaMxk9oRB4nJWqPq53cGDPCAw6rf7CyoI9MAu9QQNfv-y-gVUg1AtqEFwysSd0a1EapaFSNbqnpmoI3bbYamxi2NXIhDiCPSBoNL2wXO6DLL52ysEyIaBTxvBSIHRMMyFQcNNCo1XrFbnseguqQ80sV9LE8IdEUI1_2DHjUHaELlrojYPnBkiRkCKJokrJZ9Q2Ct5GVkWjhUgo1ZkgR-gnKHsL3ELDhDBQsurJBcPA4N89SsuZGDBN1cSN0uEGHAg0SoPGuq-cd1DzFqVxfjpYLo1FVjtvz_qjD4Ru3a3XxfA4JsmGJJsHFOoFmHaVqFTfCR-v4a27wlfmvg0QuqyVBSZraJlte0HoKhTM9loOSb_yC7i06m40JoY_FWjsBK-YRReD7j2WK4ZW_f4w6LaC60h1Fj6a5iG2nXFoLjSp3AW8sKNL9h6lq_C5CYI_oQ8CMrh7H-ALFwKeELtzhOakSLKtbxEmLGrJLH9G2IHVHOvvNwdrGi4x-AzReHtqyn_wsl_sgbv-NQomIgbUM2qffKlkdKsxQElxdNVzleMWDGJrQPAn9GrBLNScCTeBFZOELqwbGI3GuNkcE3QGZ_L4wo6EroYk18rGrkAkc3eD20VCkk2r6l4gkMUDSTZNL6vYfQDJk1pZQpeEzpneJyTbQJhikn0i2fa1KXKSfXahB4n0rsQKIpJ9hqbIByvgVYRQFZDscVCK_Q_e4MrJMcH3skVpvYzTzTbAi5wsHuHC1GDGwYbKxW4IuJu3k_NnL-E7gdzxXvV2xBp8vrZOV2NcyTQmoVh9CjZw7V3nNfopHSA2Ll-uVIvH8-ekbIE93yknwO8qqA-9Sn3sTHN7iCsljWXSQkg5lzW-TkSTW6LJLdFf0SgeuObtFNbfXvRLldztlgAy0OWgRj1a0HOMGiwYi92YnFOyvP30Zqc48_OHAfGqX65c8Ej0DlL6w0jZz3bvoJ9Pytr2ovFxDumkk7Y-KcwnCqyuvUI2KORThcEnY5VGrzh26oecCwM02PzpEb03pCP-9Zi6ifSr-D9k3yJ_LfJ3xvVSZhjYm09_CSdfmJ8wc8jFB8n5bZx3g_suQb8RfkO17yn8LgL-RTX9YVou8luyY6X_fw6_aqxAwqd_QKf00nMWM2_EU_PSu32id___WI2_F_n4wF2tRub2z0YOcxerCyK_3AH51Q5wKb3eAXdZ6JL-TzHkV7R0IxPv7JXzNsinyfk48O01M_Lmddo_jpzdWBtjmdKrtXG1acbFMVrPbqi83R_52_3xk_6fN0vohXB6g8u98yOEc2eZzOp1Vq-yFZvhOl3kRb5M82Q5O6yLMk-LKlliukoX1QLrMqNFUlZY0GVSlasZX9OEzpN5MqdpntFlXOZ5UqdsUdOkLDEpSJ5gy7iIhXhuY6X3M25Mj-s0W85X85lgJQozvirQaycVlf3ekDwR3Fhz1rPcCv9S4euX3TeXv_dOglArNOE84049bmYmZ6fJYW7Wa7E-WNsZR7l0S-h2z-2hL-NKtYRunQvDVzS8qyB06-MwhG6HUJ7X9N8AAAD__xxv54E">