[Mlir-commits] [mlir] [MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (PR #88712)

Fri May 24 06:01:52 PDT 2024

================
@@ -160,6 +215,21 @@ struct LinalgOpTilingInterface
     return success();
   }
 
+  FailureOr<TilingResult> getTiledImplementationFromOperandTile(
+      Operation *op, OpBuilder &b, unsigned operandNumber,
+      ArrayRef<OpFoldResult> offsets, ArrayRef<OpFoldResult> sizes) const {
+    SmallVector<OpFoldResult> mappedOffsets, mappedSizes;
+    auto tilingInterfaceOp = cast<TilingInterface>(op);
+    if (failed(tilingInterfaceOp.getIterationDomainTileFromOperandTile(
+            b, operandNumber, offsets, sizes, mappedOffsets, mappedSizes))) {
+      return emitError(
+          op->getLoc(),
+          "unable to obtain the iter domain position of the operation.");
+    }
+    return tilingInterfaceOp.getTiledImplementation(b, mappedOffsets,
+                                                    mappedSizes);
----------------
ftynse wrote:

In general, I am not very happy to introduce more methods to an interface, especially if they are quite similar but subtly different. This may lead to copy-pasted implementations, divergent implementations and other things that are better avoided. The way I think about interfaces is a contract that another person will have to implement to opt into the transformation. The more complex the contract is, the less beneficial it is for that person to use it instead of just duplicating the transformation, which defies the purpose of sharing the transformation logic through interfaces.

I asked a question elsewhere in the patch driven by similar logic: why isn't `getTiledImplementationFromOperandTile` is a separate interface method and not a simple composition of `getIterationDomainTileFromOperandTile` and `getTiledImplementation` (from the iteration domain tile). IMO, we should orthogonalize the interface better. The tilable ops are fundamentally an iteration space + mappings from that space to indexing spaces of operands and results. In case of tilling, if we want to reason about the whole computation, the iteration space is a product of the loop iteration space and the iteration space of the tiled op.

Now, IIUC the problem is that `getTiledImplementation` currently assumes operating on the complete, untiled op which accepts full operands, and uses that to produce tiles of the operands. This assumption doesn't hold in the case when the consumer is forcibly moved in the loop. We seem to all agree that this assumptions should be lifted.

My thinking here is that `getTiledImplementation` does too much. It should not be emitting IR that computes a slice of each operand. Instead, it should be _given_ such slices. These slices in turn can be obtained from a separate `getOperandTileForIterationDomainTile` method (though I'm not sure whether this should be a method on the operation interface, or we should introduce a "TileableData" type interface that is implemented by tensors and potentially other types). This method would be called by "pure" tiling and producer fusion, and would not be called by consumer fusion _for the operand that comes from the producer_ (it will still be called for _other_ operands). Instead consumer fusion will directly supply `getTiledImplementation` with the slice of the operand that is just the result of the tiled producer (before `insert_slice`). This would also work if we were to introduce, e.g., nested linalg generics that folks have been discussing for years, where the data tiling of operands would be part of the surrounding generic that acts as a loop; as well as for the cases of non-hyperrectangular data structures.

This generally goes in a similar direction that what's proposed above, so I hope we are all converging. This will inflict some pain on downstreams, but looks like the more future-proof direction for upstream, and a better orthogonalization. (It is also unclear to me that having to implement similar-but-different methods for downstream other than yours is a significantly lesser pain than adapting to this.)  Specifically, we are decoupling the tiling of the _iteration space_ (operation) from the that of _data spaces_ (operands), and define the interface in terms that are related to tiling itself (iteration space tile, operand tile, result tile) rather than fusion (implementation as consumer, as producer), it's a tiling interface after all!

My preferred way to move this forward, if we all agree on the proposed decomposition of methods, is to make a separate PR that splits out `getOperandTileForIterationDomainTile` and uses it in the existing case, and rebase this PR on top of that. Though I could also accept these changes directly in this PR if rebasing is too involved.

There are also some observations for future generalizations that are non-blocking but appeared to me as I was thinking about this problem:
 -  What fusion needs to achieve is to "align" the iteration spaces of the producer and the consumer tilable ops. How do we check if this iteration space tile can be obtained from the original iteration space of the consumer op by tiling, e.g., if the consumer shifts the operand indexing as `i+1` or otherwise reuses it, this may not be trivially valid to just move it into the loop. Currently, we are likely restricting this to trivial permutations...
 - How (and where) do we generalize tiling of the result rather than always doing `insert_slice`?

https://github.com/llvm/llvm-project/pull/88712