[Mlir-commits] [mlir] [MLIR] Add continuous tiling to TileUsingForOp (PR #82792)

Thu Mar 21 03:23:45 PDT 2024

rolfmorel wrote:

Hi @ftynse, @nicolasvasilache, @MaheshRavishankar

Thank you for pointing out how multisize tiling is handled and suggesting that the continuous tiling transform can be modelled after it. 

@muneebkhan85 and I had a discussion on how continuous tiling might be possible in the transform dialect. The main issue we come up against is that rather than a statically known number of loops being generated, as with multisize tiling, continuous tiling can generate an arbitrary number of tiling loops. That is, whereas `transform.structured.multitile_sizes` returns two tile sizes and one split point, something like `transform.structured.continuous_tile_sizes` would output `n` tile sizes and `n-1` split points (based on the target size and the size of the dimension). As such we need one or more ops to encapsulate iterating over `n`, the number of splits/tiling loops.

By modelling continuous tiling after multisize tiling, we came up with the following: 

```
%linalg = transform.structured.match ops{["linalg.generic"]} in %payload
%tile_sizes, %split_points = transform.structured.continuous_tile_sizes %linalg { dimension = 0, target_size = 9} : (!transform.any_op) -> (!transform.param<i64>, !transform.param<i64>)
%head_linalg, %tail_linalgs = transform.structured.split %linalg after %split_points { dimension  = 0 }
%linalg_splits = transform.merge_handles %head_linalg, %tail_linalgs
%tiled_splits = transform.foreach %linalg_splits, %tile_sizes {
  ^bb1(%linalg_split: !transform.any_op, %tile_size: !transform.any_param):
  %tiled_linalg_split = transform.structured.tile_using_for %linalg_split [%tile_size]
  transform.yield %tiled_linalg_split
}
```

The above departs from the current transform dialect in three ways: 
 - `transform.structured.continuous_tile_sizes` is a new op with two results:
    - The first result (`%tile_sizes`) is a list containing the target size followed by all smaller powers of 2 (9, 8, 4, 2, 1 in this example)
    - The second result (`%split_points`) is a list of split points of the iteration space of the given dimension, specifying where each of the tile sizes should be applied (à la the split point of `multitile_sizes`). 
 - `transform.structured.split` is changed such that it performs a multiway split at once. The above assumes that `split` would still produce only two outputs, though the second handle (`%tail_linalgs`) would collect all but the first split-off part.
 - `transform.foreach` is changed to take multiple handles by making `target` variadic and iterates over all handles at once by "zipping" the lists associated with the handles. The enclosed block would take as many arguments as there are targets.

For the generalisation of `foreach` we have come across other use cases as well (in particular for fusing loops coming from co-indexed handles). 

Could you let us know if the above is similar to what you had in mind? We would be happy to work on landing the above three changes.

If there are other more preferable paths to supporting continuous tiling, do let us know. @ftynse, you mentioned an approach whereby the transform dialect would itself be subject to rewrites. Do you have a reference for us to look at?

https://github.com/llvm/llvm-project/pull/82792