[Mlir-commits] [mlir] [MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (PR #88712)

Thu May 30 13:56:32 PDT 2024

================
@@ -160,6 +215,21 @@ struct LinalgOpTilingInterface
     return success();
   }
 
+  FailureOr<TilingResult> getTiledImplementationFromOperandTile(
+      Operation *op, OpBuilder &b, unsigned operandNumber,
+      ArrayRef<OpFoldResult> offsets, ArrayRef<OpFoldResult> sizes) const {
+    SmallVector<OpFoldResult> mappedOffsets, mappedSizes;
+    auto tilingInterfaceOp = cast<TilingInterface>(op);
+    if (failed(tilingInterfaceOp.getIterationDomainTileFromOperandTile(
+            b, operandNumber, offsets, sizes, mappedOffsets, mappedSizes))) {
+      return emitError(
+          op->getLoc(),
+          "unable to obtain the iter domain position of the operation.");
+    }
+    return tilingInterfaceOp.getTiledImplementation(b, mappedOffsets,
+                                                    mappedSizes);
----------------
MaheshRavishankar wrote:

> 
> Thanks for this write-up! Would you mind putting it where it belongs, that is, in the [interface documentation](https://github.com/llvm/llvm-project/blob/3ce9b86cfd2d88162bc4a551dd7910b8cff3097b/mlir/include/mlir/Interfaces/TilingInterface.td#L19-L20).

Yes, Ill do that. I realize this is probably useful to document more than it is today.

> 
> Indeed, there has been quite a lot of design discussion in this PR, though most of the initial reviews were enforcing code compatibility with the rest of MLIR codebase as well as correctness of the results. Maybe it should have been preceded by an RFC where such a discussion could have taken place.

Maybe. From my view this was a strict extension to the interface it didn't cross my mind that it needed an RFC. I didn't anticipate that the change would be controversial.


> This asymmetry is baffling. Why does operand coordinate space slicing is delegated to the interface method, but result space recomposition is not? (I know the technical issue is due to `insert_slice`/`parallel_insert_slice` being dependent on the loop type.)

Good question! Comes down to how each loop handles the tiles. While the interface method handles the "generation of a tile of the result, and by extension taking the appropriate slices of the operands (or whatever else it needs to do to generate the tile of the result), how these tile results are put back is actually upto the loop type. It doesnt seem as the right break up to push that to the implementation itself. Then each implementation will have to account for each loop type which breaks the purpose of the interface.

> 
> > This method might or might not reuse the implementation in getTiledImplementation. That depends on op semantics. That is why there isnt a default implementation of this method in the current interface definition.
> 
> It is unclear to me when it cannot reuse `getTiledImplementation`? Do you have a specific operation in mind?

I think Quinn pointed out the `tensor.pad` implementation. Honestly I dont have all of the implementations indexed in my head to think of an example. The goal is to build the `TilingInterface` to expose the minimum amount of information that an operation needs to expose in order to be usable to write a tiling implementation in an op-agnostic way. Its a bit of a path finding in some cases, and making as less assumptions as possible. 


> 
> > I think things get muddled when we try to have a default implementation of the two fusion methods. Default implementation now need to take an opinion on how the tiling of each operation works.
> 
> It is muddled even before. Why does the tiling interface should know anything about fusion? It is sufficient to provide the way to compute mappings between slices of the iteration space and slices of the operands / results.
> 

One of the aims of `TilingInterface` was to "generalize" what Linalg does and make it more widely applicable. We havent made that much progress on this in LLVM itself (more progress has been made in IREE). So the tiling interface was done not only to generalize tiling but also generalize tile + fuse. So the `getTiledImplementation` only deals with tiling. If an op implements more methods then it can be tiled and fused as well. So they are inherently linked. I have also stayed away from saying slice of iteration space -> slice of operands/results is the "requirement" for tiling. Instead, my base assumption is not too rely on any relationship between iteration space and operand/result slices. But it is also true, that if this relationship exists, then a default implementation could cover the use case for a lot of operations. At the outset, I wasnt sure that this is actually the case, but as more of these have been added, it does seem to be so.

> > They are just convenience methods that reduce implementation burden on operations that implement the interface.
> 
> Why shouldn't we reduce the implementation burden? This is probably my main concern here.
> 
> > I have strong concerns with this. getTiledImplementation is necessarily black boxed. I would not want to assume that you can produce an implementation of tile from the slice. <...>
> 
> Maybe I'm insufficiently clear above. I am _not_ suggesting that `getTiledImplementation` should take slices of operands _instead_ of the iteration space slice. I am suggesting that it takes them _in addition_ to that. Same as @qedawkins in his comment. This is merely splitting current implementations of the `getTiledImplementation` method into two parts (1) one that produces operand slices (or just returns the operand if it for some reason doesn't need to do anything) and (2) one that produces the computational body of the loop given the result of (1). Could you elaborate on why this would be more opinionated?
> 

(1) is fundamentally assuming that you can compute the operands slices and that you can use it to create the tiled implemetnation. It is indeed true for Linalg and other operations, but I find it hard to say that is always possible to split this way, and I would say the split would be justified if it were needed by the implementation of tiling that is using the interface. If the tiling implementation does not need to have this split, then there is no reason for the interface to have these methods.

> > I suggest we first try to land this PR with something that is equivalent of generateOperandTileValue and then come back to maybe having default implementation of the different methods.
> 
> Does this solve the issue @qedawkins pointed out? (By triplication of logic in each implementation I suspect.)

Follow up would be to have the default implementation just use the mapping from iteration space <-> data space and reuse `getTiledImplementation`. That will reduce the duplication for cases it is indeed a duplication and preserve the ability of operations to override it when it doesnt match the op semantics.


https://github.com/llvm/llvm-project/pull/88712