[Mlir-commits] [mlir] [MLIR] Add continuous tiling to TileUsingForOp (PR #82792)

Thu Mar 28 18:32:44 PDT 2024

MaheshRavishankar wrote:

> > High level comment, why cant this just be an after tile and fuse peeling transformation. What is happening here seems to be peeling applied repeatedly.
> 
> An issue here is that the static tile size is created using AffineMin at the time of creating the loop nests. This makes the issue more complicated than just having to create new tail-loops and clone regions from the original loop into it.

Sorry for the delay. I am still not convinced this is the easiest way to doing this. Looking at the description, you are trying to split full tiles and partial tiles (and then applying that recursively). As easier approach to me is to do something like this

```
%result = <linalg_op>(%operand1, %operand2, ...)
```

you can first split into two
```
%operand0_slice = tensor.extract_slice %operand0
%operand1_slice = tensor.extract_slice %operand1
....
%result_slice = <linalg_op>(%operand0_slice, %operand1_slice,...)
...
%operand0_remaining = tensor.extract_slice %operand0
%operand1_remaining = tensor.extract_slice %operand1
....
%result_remaining = <linalg_op>(%operand0_remaining, %operand1_remaining)
....
%result = tensor.insert_slice %result_slice into ....
%result1 = tensor.insert_slice %result_remaining  into ...
```

Now you can tile the individual ops. This does not need any changes to the  Tiling implementation itself, and is also modular. You can take the second linalg op and apply the same procedure repeatedly to get all the different loops. That would be upto the caller to do (or wrapped in a helper function), but the core logic is just this one step.

https://github.com/llvm/llvm-project/pull/82792