[Mlir-commits] [mlir] [mlir][scf] Extend consumer fuse to nested loop structure (PR #94190)
donald chen
llvmlistbot at llvm.org
Sun Jul 14 22:15:43 PDT 2024
cxy-1993 wrote:
> > Thanks for iterating on this patch. Before I look into the details further, I would like to understand some background: Is this patch intended to fuse nest region ops? Why does it only seem to handle scf.for from the code?
>
> As title described, this patch intends to fuse consumer within nest loop structure w/(or w/o) multiple candidates, including both `scf.for` and `scf.forall` , where multiple candidates represents multiple candidate `sliceOps` existing under different level loop for fusing consumer.
>
> Thus, nest loop structure including two cases:
>
> 1. nest loop structure w/o multiple candidates(a.k.a. perfectly nest loops), saying:
>
> ```
> scf.for() {
> scf.for() {
> scf.for() {
> tensor.insert_slice
> }
> }
> }
> ```
>
> This PR enhance current `tileAndFuseConsumerOfSlice`(renamed to `tileAndFuseConsumerOfSliceImpl` in this patch) API to support above scenario.
>
> 2. nest loop structure w/ multiple candidates, including:
>
> ```
> scf.for() {
> scf.for() {
> tensor.insert_slice
> }
> tensor.insert_slice
> }
> ```
>
> or
>
> ```
> scf.forall() {
> scf.for() {
> tensor.insert_slice
> }
> scf.forall.in_parallel {
> tensor.parallel_insert_slice
> }
> }
> ```
>
> or
>
> ```
> scf.forall() {
> scf.forall() {
> scf.forall.in_parallel {
> tensor.parallel_insert_slice
> }
> }
> scf.forall.in_parallel {
> tensor.parallel_insert_slice
> }
> }
> ```
>
> This PR deals with this scenario by iteratively applying `tileAndFuseConsumerOfSliceImpl`.
>
> > Why does it only seem to handle scf.for from the code?
>
> As you can see above, nest `scf.forall` must contain multiple candidates. Then it belongs to second type of solution without additional changes like perfectly nest `scf.for` but just iterative application of existing API as other reviewers suggested in early [thread](https://github.com/llvm/llvm-project/pull/94190#issuecomment-2153001202).
Thanks for your explain. I currently do not have an opinion on how to select candidate ops. I may not have expressed my question clearly. My question is why was scf.for chosen as the anchor point?
If we have following input:
```
%0 = def_op()
scf.forall ... {
scf.if() {
%1 = use(%0)
}
}
```
Should we fuse it too and what is the fundamental difference between fuse if and fuse for? How should we handle other ops with regions or multiple blocks under scf.execute_region? I think there should be similarities in how these fuse processes handle these cases. Perhaps we can consider these issues when we start working on scf.for.
https://github.com/llvm/llvm-project/pull/94190
More information about the Mlir-commits
mailing list