[Mlir-commits] [mlir] [MLIR][Linalg][Docs] Add forms to Linalg rationale docs (PR #156859)

Tue Sep 9 07:09:00 PDT 2025

================
@@ -506,6 +506,65 @@ potential by introducing lower-level IR ops and *smaller* Linalg ops.
 This gradually reduces the potential, all the way to Loops + VectorOps
 and LLVMIR.
 
+### Interchangeability of Forms<a name="forms"></a>
+
+Linalg's various forms (named, generic) also carry information, and that
+information should be preserved as much as possible during the progressive
+lowering. A `matmul` operation is a special case of a `contract` operation,
+which in turn is a special case of `generic` operation. Transformations on
+the more special forms should not be converted to the more generic ones
+unnecessarily, in the same way that they should not be broken down into
+loops + arithmetic if they can still be represented as a Linalg op.
+
+#### Generic, Category, Named<a name="generic_category_named"></a>
+
+The core Linalg operation tree has three forms:
+* **Generic:** Represented by `linalg.generic` and can encode all perfectly-nested
+loop operations.
+* **Category:** Represented by `linalg.contract` and `linalg.elementwise`,
+which are special (einsum) forms of the `generic` operation.
+* **Named:** All _named_ forms that can lower to either _category_ or
+_generic_ forms. For example, `linalg.matmul`, `linalg.add`, etc.
+
+Unlike lowering to loops, the different Linalg forms that are derived from
+`linalg.generic` are *equivalent*. It should always be possible to convert
+a named operation into a generic and back to named, if the semantics is
+preserved. The various forms in the Linalg dialect are meant to facilitate
+pattern matching (single operations or DAGs) and to be able to consider
+different forms as *canonical* for different transforms.
+
+#### Special Operations<a name="special_ops"></a>
+
+Not all Linalg operations represent perfectly nested loops, and therefore
+cannot be represented as a `linalg.generic`. There are two kinds of Linalg
+operations that fall into this category:
+* **Composite:** Operations that compose multiple Linalg operations, for
+example `linalg.softmax`. These can be converted to a DAG of Linalg operations
+(in any form).
+* **Special Named:** Operations that are usually matched against library calls
+or special lowering, but can only be lowered to a combination of Linalg and
+non-Linalg operations, for example `linalg.*conv*`, `linalg.winograd*`,
+`linalg.pooling*`, etc.
----------------
banach-space wrote:

> Is that also true for winograd and pooling?

I've not used pooling much and haven't touched winograd, so please double check.

**Pooling**

IIUC, pooling is just a convolution with a different reduction operation, so the answer would be "yes".

```bash
$ cat pooling.mlir
func.func @pooling_ncw_sum_memref_2_3_2_1(%input: memref<4x2x5xf32>, %filter: memref<2xf32>, %output: memref<4x2x3xf32>) {
  linalg.pooling_ncw_sum
    {dilations = dense<2> : tensor<1xi64>, strides = dense<1> : tensor<1xi64>}
    ins(%input, %filter : memref<4x2x5xf32>, memref<2xf32>)
    outs(%output : memref<4x2x3xf32>)
  return
}
$ bin/mlir-opt --linalg-generalize-named-ops pooling.mlir
#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2 + d3 * 2)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d3)>
#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
module {
  func.func @pooling_ncw_sum_memref_2_3_2_1(%arg0: memref<4x2x5xf32>, %arg1: memref<2xf32>, %arg2: memref<4x2x3xf32>) {
    linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0, %arg1 : memref<4x2x5xf32>, memref<2xf32>) outs(%arg2 : memref<4x2x3xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %0 = arith.addf %out, %in : f32
      linalg.yield %0 : f32
    }
    return
  }
}
```

**Winograd**

These are indeed composite Ops, see:
```bash
$ cat winograd.mlir
module {
  func.func @winograd_filter_dyn(%arg0: tensor<?x3x3x?xf32>, %arg1: tensor<6x6x?x?xf32>) -> tensor<6x6x?x?xf32> {
    %0 = linalg.winograd_filter_transform fmr(F_4_3) ins(%arg0 : tensor<?x3x3x?xf32>) outs(%arg1 : tensor<6x6x?x?xf32>) -> tensor<6x6x?x?xf32>
    return %0 : tensor<6x6x?x?xf32>
  }
}

$ bin/mlir-opt winograd.mlir -test-linalg-transform-patterns=test-decompose-winograd-ops
module {
  func.func @winograd_filter_dyn(%arg0: tensor<?x3x3x?xf32>, %arg1: tensor<6x6x?x?xf32>) -> tensor<6x6x?x?xf32> {
    %cst = arith.constant dense<[[1.000000e+00, -0.333333343, -0.333333343, 0.0833333358, 0.0833333358, 0.000000e+00], [0.000000e+00, 0.333333343, -0.333333343, -0.166666672, 0.166666672, 0.000000e+00], [0.000000e+00, -0.333333343, -0.333333343, 0.333333343, 0.333333343, 1.000000e+00]]> : tensor<3x6xf32>
    %cst_0 = arith.constant dense<[[1.000000e+00, 0.000000e+00, 0.000000e+00], [-0.333333343, 0.333333343, -0.333333343], [-0.333333343, -0.333333343, -0.333333343], [0.0833333358, -0.166666672, 0.333333343], [0.0833333358, 0.166666672, 0.333333343], [0.000000e+00, 0.000000e+00, 1.000000e+00]]> : tensor<6x3xf32>
    %cst_1 = arith.constant 0.000000e+00 : f32
    %c0 = arith.constant 0 : index
    %c-9223372036854775808 = arith.constant -9223372036854775808 : index
    %c1 = arith.constant 1 : index
    %0 = scf.for %arg2 = %c0 to %c-9223372036854775808 step %c1 iter_args(%arg3 = %arg1) -> (tensor<6x6x?x?xf32>) {
      %1 = scf.for %arg4 = %c0 to %c-9223372036854775808 step %c1 iter_args(%arg5 = %arg3) -> (tensor<6x6x?x?xf32>) {
        %extracted_slice = tensor.extract_slice %arg0[%arg2, %c0, %c0, %arg4] [1, 3, 3, 1] [1, 1, 1, 1] : tensor<?x3x3x?xf32> to tensor<3x3xf32>
        %2 = tensor.empty() : tensor<6x3xf32>
        %3 = linalg.fill ins(%cst_1 : f32) outs(%2 : tensor<6x3xf32>) -> tensor<6x3xf32>
        %4 = linalg.matmul ins(%cst_0, %extracted_slice : tensor<6x3xf32>, tensor<3x3xf32>) outs(%3 : tensor<6x3xf32>) -> tensor<6x3xf32>
        %5 = tensor.empty() : tensor<6x6xf32>
        %6 = linalg.fill ins(%cst_1 : f32) outs(%5 : tensor<6x6xf32>) -> tensor<6x6xf32>
        %7 = linalg.matmul ins(%4, %cst : tensor<6x3xf32>, tensor<3x6xf32>) outs(%6 : tensor<6x6xf32>) -> tensor<6x6xf32>
        %inserted_slice = tensor.insert_slice %7 into %arg5[%c0, %c0, %arg4, %arg2] [6, 6, 1, 1] [1, 1, 1, 1] : tensor<6x6xf32> into tensor<6x6x?x?xf32>
        scf.yield %inserted_slice : tensor<6x6x?x?xf32>
      }
      scf.yield %1 : tensor<6x6x?x?xf32>
    }
    return %0 : tensor<6x6x?x?xf32>
  }
}
```

So the answer is "no".

https://github.com/llvm/llvm-project/pull/156859