[Mlir-commits] [mlir] [mlir][scf] Extend consumer fuse to nested loop structure (PR #94190)

donald chen llvmlistbot at llvm.org
Sun Jul 14 22:15:43 PDT 2024


cxy-1993 wrote:

> > Thanks for iterating on this patch. Before I look into the details further, I would like to understand some background: Is this patch intended to fuse nest region ops? Why does it only seem to handle scf.for from the code?
> 
> As title described, this patch intends to fuse consumer within nest loop structure w/(or w/o) multiple candidates, including both `scf.for` and `scf.forall` , where multiple candidates represents multiple candidate `sliceOps` existing under different level loop for fusing consumer.
> 
> Thus, nest loop structure including two cases:
> 
> 1. nest loop structure w/o multiple candidates(a.k.a. perfectly nest loops), saying:
> 
> ```
> scf.for() {
>   scf.for() {
>     scf.for() {
>        tensor.insert_slice
>     }
>   }
> }
> ```
> 
> This PR enhance current `tileAndFuseConsumerOfSlice`(renamed to `tileAndFuseConsumerOfSliceImpl` in this patch) API to support above scenario.
> 
> 2. nest loop structure w/ multiple candidates, including:
> 
> ```
> scf.for() {
>   scf.for() {
>       tensor.insert_slice
>   }
>   tensor.insert_slice
> }
> ```
> 
> or
> 
> ```
> scf.forall() {
>   scf.for() {
>       tensor.insert_slice
>   }
>   scf.forall.in_parallel {
>      tensor.parallel_insert_slice 
>   }
> }
> ```
> 
> or
> 
> ```
> scf.forall() {
>   scf.forall() {
>     scf.forall.in_parallel {
>       tensor.parallel_insert_slice 
>     }
>   }
>   scf.forall.in_parallel {
>      tensor.parallel_insert_slice 
>   }
> }
> ```
> 
> This PR deals with this scenario by iteratively applying `tileAndFuseConsumerOfSliceImpl`.
> 
> > Why does it only seem to handle scf.for from the code?
> 
> As you can see above, nest `scf.forall` must contain multiple candidates. Then it belongs to second type of solution without additional changes like perfectly nest `scf.for` but just iterative application of existing API as other reviewers suggested in early [thread](https://github.com/llvm/llvm-project/pull/94190#issuecomment-2153001202).

Thanks for your explain. I currently do not have an opinion on how to select candidate ops. I may not have expressed my question clearly. My question is why was scf.for chosen as the anchor point?

If we have following input:

```
   %0 = def_op()
   scf.forall ... {
      scf.if() {
          %1 = use(%0)
       }
   }
```

Should we fuse it too and what is the fundamental difference between fuse if and fuse for? How should we handle other ops with regions or multiple blocks under scf.execute_region? I think there should be similarities in how these fuse processes handle these cases. Perhaps we can consider these issues when we start working on scf.for.

https://github.com/llvm/llvm-project/pull/94190


More information about the Mlir-commits mailing list