[Mlir-commits] [mlir] [mlir][scf] Relax requirements for loops fusion (PR #79187)

Wed Jan 24 02:04:46 PST 2024

fabrizio-indirli wrote:

This patch attempts to enable the fusion of parallel loops in more cases, by relaxing some of the requirements:

### Allowing fusion when only one function argument is being written
Before this patch, the following loops would not be fused:
```
func.func @fuse_two(%A: memref<2x2xf32>, %B: memref<2x2xf32>,
                    %C: memref<2x2xf32>, %result: memref<2x2xf32>) {
  ...  // constants definitions
  %sum = memref.alloc()  : memref<2x2xf32>
  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
    %B_elem = memref.load %B[%i, %j] : memref<2x2xf32>
    %C_elem = memref.load %C[%i, %j] : memref<2x2xf32>
    %sum_elem = arith.addf %B_elem, %C_elem : f32
    memref.store %sum_elem, %sum[%i, %j] : memref<2x2xf32>
    scf.reduce
  }
  scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
    %sum_elem = memref.load %sum[%i, %j] : memref<2x2xf32>
    %A_elem = memref.load %A[%i, %j] : memref<2x2xf32>
    %product_elem = arith.mulf %sum_elem, %A_elem : f32
    memref.store %product_elem, %result[%i, %j] : memref<2x2xf32>
    scf.reduce
  }
  memref.dealloc %sum : memref<2x2xf32>
  return
}
```
... because they were deemed as having possible aliasing issues, even though only one of the function arguments is being written.
[This test case is actually taken from the LIT suite that was already in place](https://github.com/llvm/llvm-project/blob/543cf08636f3a3bb55dddba2e8cad787601647ba/mlir/test/Dialect/SCF/parallel-loop-fusion.mlir#L27), but the failure wasn't being picked up due to a missing check in the test: thus, the loops weren't being fused, but the test passed anyway.
In my patch, I'm checking that at most one buffer from the function's arguments is being written among the two loops: if so, I assume there is no risk of aliasing. **Is this a correct assumption?**

### Allowing fusion when 1st loop contains multiple writes to a buffer that is then read, but always on the same indices
Before this patch, the following loops would not be fused:
```
  scf.parallel (%arg0) = (%c0) to (%c5) step (%c1) {
   ...
   memref.store %c2 %alloc[%arg0]
   ...
   memref.store %c2 %alloc[%arg0]
   scf.reduce
 }
 scf.parallel (%arg0) = (%c0) to (%c5) step (%c1) {
   ...
   %2 = memref.load %alloc[%arg0]
 }
```
... because the first loop contains multiple stores on the same buffer, that is then read by the 2nd loop.
However I believe that there should be no problem in fusing these loops when the 'store's access exactly the same indices, like in the example above.

https://github.com/llvm/llvm-project/pull/79187