[Mlir-commits] [mlir] [mlir][Linalg] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (PR #91878)

Thu Jun 6 12:23:56 PDT 2024

MaheshRavishankar wrote:

> After deeper investigation, I see, the tests that change are not the ones with "num_threads", all changed tests are only related to the classical "tile_sizes" case...
> 
> Before this change, when tiling with tile sizes, we used to call:
> 
> ```
> return tileToForallOpImpl(b, op, numThreads,
>                              /*nominalTileSizes=*/tileSizes, mapping,
>                              /*omitTileOffsetBoundsCheck=*/true);
> ```
> 
> I.e. we explicitly omitted the affine.max because we statically know that tiling with tile sizes does not require it. This is the same reasoning for tiling with scf.for: we do not emit affine.max on the lower bound.
> 
> This change introduces affine.max in "regular tiling" with tile sizes and I believe this is redundant and unnecessary.
> 
> @MaheshRavishankar coming back to you point about:
> 
> ```
> Re 3 : I dont think there is any extra divisions. Most of the lit tests stay the same. The only ones that changed are the dynamic shape tests which I think were originally wrong (You cant really avoid generating the max operation).
> ```
> 
> Can you explain why you think they were originally wrong? My position is that you can avoid generating the max operation when tiling scf.forall with tile sizes for the same reason that you don't generate affine.max when tiling scf.for ..

Thanks for the explanation. I missed this in the code (and not sure it was documented). Let me see how to preserve this information, but it doesnt seem very easy to me. I would ideally like to do this
1) Earlier the `linalg::tileToForallOp` generated the num_threads version of `scf.forall`  and `linalg::tileToForallOpUsingTileSizes`  also generated the num_threads version of the operation by computing the num_threads from the tile sizes. I'd like to switch it. I'd like to make  `linalg::tileToForallOp`s replacement generate the `num_threads` version of the `scf.forall` and make `linalg::tileToForallOpUsingTileSizes` generate the `tile_sizes` version of the `scf.forall`. The former will generate the `affine.max` the latter wont. I actually started with this approach cause it essentially then becomes more consistent with what the caller asks for. If the caller wants the `num_threads` version, you get that, if you want `tile_sizes` version you get that. There is no "implicitly compute the number of threads from tile_sizes".
2) You can write a conversion from the `tile_sizes` version to the `num_threads` version as a simple transformation. This will automatically avoid the `affine.max` generation.
3) I can keep the behavior of `transform.structured.tile_to_forall` same as before, i.e. when `tile_sizes` are specified, it will generate the `tile_sizes` version of the `scf.forall` and immediately convert the `scf.forall` to the `num_threads` version.

I'll think of alternative approaches, but for some of the next steps I have locally this would actually make sense.

https://github.com/llvm/llvm-project/pull/91878