[Mlir-commits] [mlir] [mlir][scf] Extend consumer fuse to nested loop structure (PR #94190)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Jun 6 10:03:19 PDT 2024
MaheshRavishankar wrote:
Lets start with your example above. I think your input is
```
%0 = linalg.fill .. outs(%empty)
%1 = linalg.matmul ... outs(%0)
%2 = linalg.add (..., %1)
```
You can first tile the `linalg.matmul`
```
%0 = linalg.fill ... outs(%empty)
%1 = scf.forall ... shared_outs(%arg0 = %0) {
%2 = tensor.extract_slice %arg0[...]
%3 = linalg.matmul ... outs(%2)
scf.forall.in_parallel {
tensor.insert_in_parallel %3 into %arg0
}
}
%2 = linalg.add (.., %1)
```
You can fuse the `fill` and the `add` in to get
```
%0 = scf.forall ... shared_outs(%arg0 = %empty) {
%1 = tensor.extract_slice %arg0
%2 = linalg.fill ... outs(%1)
%3 = linalg.matmul ... outs(%2)
%4 = linalg.add
scf.forall.in_parallel {
tensor.insert_in_parallel %4 into %arg0
}
}
```
Now you can apply the same two steps again for the second level of tiling and use scf.for instead. Doesnt that give you what you are looking for?
https://github.com/llvm/llvm-project/pull/94190
More information about the Mlir-commits
mailing list