[Mlir-commits] [mlir] [mlir][scf] Extend consumer fuse to nested loop structure (PR #94190)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Jun 5 17:36:11 PDT 2024
Yun-Fly wrote:
Hi, @nicolasvasilache @MaheshRavishankar , try to reply both in one thread.
> this should be done by multiple application of existing transformations
Could you detail more about how to apply multiple existing transformations by an example?
> First tile the consumer
> ....
> then you fuse %0 within the scf.for nest that is created during tiling of consumer to get
1. The difference is the fusion direction: consumer-to-producer or producer-to-consumer. IMO, this is two different but both feasible solution for fusion transform. In general, it should also be functionally enabled and provide an option for users to select case by case. I guess what you mean here is `tileConsumerAndFuseProducersUsingSCF` using `tileAndFuseProducerOfSlice`. But, as the counterpart, this patch targets on another technical path `tileAndFuseConsumerOfSlice`, just as same as previous merged [PR](https://github.com/llvm/llvm-project/pull/88712) which does not support nested loop structure currently.
2. From tiling perspective, the major difference between consumer-to-producer or producer-to-consumer is that which one takes higher priority to decide how to partition the tiling size by iteration domain. For instance, if we tile consumer first and then fuse producer as you illustrated:
a. the tiling size of producer comes from tiled consumer by tiling propagation based on `AffineMap`.
b. producer has to force itself to fit the iteration domain already generated by consumer, which may bring redundant iteration loop.
3. Based on `2`, a typical use-case where producer-to-consumer maybe more suitable than consumer-to-producer is that `matmul+post-op` fusion. As you known, `matmul` is computation sensitive and many developers have strong demand on hand-writing user-defined template with nested and complex loop to deal with multi-level tile size for peek performance, particularly for either GPU or CPU. If we start fusion with tiling post-op(like relu), the computation of `matmul` will put up with an elementwise operation.
Again, this patch is the extension of already merged [PR](https://github.com/llvm/llvm-project/pull/88712) involving producer-to-consumer fusion as well.
CC: @ZhennanQin.
https://github.com/llvm/llvm-project/pull/94190
More information about the Mlir-commits
mailing list