[Mlir-commits] [mlir] [mlir][linalg] Produce canonical linalg.generic for im2col (PR #134675)

Thu Apr 10 00:55:57 PDT 2025

fabrizio-indirli wrote:

Thanks for taking a look at this.
@qedawkins "Canonical" might be a misused term on my side, since in both cases the input indexing maps are not invertible. 
I just meant that the new format clearly shows all the accessed tensors (including the input one) and their access patterns (including the input indexing map) in the interface of the `linalg.generic` op, as one would expect normally. This allows other passes to be able to analyze (e.g. retrieve uses) and manipulate the linalg op without having to inspect its body. 
For example, a typical _linalg_ fusion pattern such as [_FuseElementwiseOps_](https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp#L463) commonly checks only the operands in the `linalg.generic` interface to traverse the def-use chains of a value to be fused. With the current implementation of img2col, the input tensor is not reported in the linalg's inputs, thus the img2col op is never fused with a producer. On the contrary, the fusion is applied with the "more canonical" linalg op I am proposing: 
```
// INPUT IR
%6 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%expanded : tensor<1x54x54x64xf32>) outs(%5 : tensor<1x54x54x64xf32>) {
  ^bb0(%in: f32, %out: f32):
    %14 = arith.minimumf %in, %cst_1 : f32
    %15 = arith.maximumf %14, %cst_2 : f32
    linalg.yield %15 : f32
  } -> tensor<1x54x54x64xf32>
  %9 = tensor.empty() : tensor<1x2704x576xf32>
  %10 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1 floordiv 52 + d2 floordiv 192, d1 mod 52 + (d2 mod 192) floordiv 64, d2 mod 64)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%6 : tensor<1x54x54x64xf32>) outs(%9 : tensor<1x2704x576xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x2704x576xf32>

// AFTER FUSION
%7 = tensor.empty() : tensor<1x2704x576xf32>
%8 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1 floordiv 52 + d2 floordiv 192, d1 mod 52 + (d2 mod 192) floordiv 64, d2 mod 64)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%expanded : tensor<1x54x54x64xf32>) outs(%7 : tensor<1x2704x576xf32>) {
  ^bb0(%in: f32, %out: f32):
    %12 = arith.minimumf %in, %cst_1 : f32
    %13 = arith.maximumf %12, %cst_2 : f32
    linalg.yield %13 : f32
  } -> tensor<1x2704x576xf32>
```

https://github.com/llvm/llvm-project/pull/134675