[Mlir-commits] [mlir] [mlir][linalg] Vectorize directly to a named contraction (PR #147296)

Wed Jul 9 03:28:23 PDT 2025

adam-smnk wrote:

Thanks a lot for the feedback!
Since I have all your attention, I'm happy to iterate on the design and work toward making it the default path.

This new lowering tries to preserve as much information as possible to avoid any need for reconstruction (like current `mutli_reduction` plus raising to `contract`).
Thanks to narrow and well-defined semantics of existing linalg contraction ops (`matmul`, `contract` etc.) the lowering is pretty straight forward for their all default representations and in presence of any transposes.

### Generics
One point I want to address first is why we should ignore `linalg.generic` here.
Simpler and cheaper matching aside, both linalg named contraction ops and `vector.contract` have narrower semantics. They cannot represent all possible contractions as defined by `linalg::detail::isContractionBody` - for example, a unary operation between a binary `elemwise` and a binary `reduce` operations represent a valid contraction body which currently cannot be represented by other ops.

I think generics should be treated as such (well, generic) and be vectorized as they are today. One can always specialize a generic to capture more narrow behavior. Equally, linalg ops can be generalized before vectorization to retain current vectorizer behavior. Thus, no obstacle here preventing enabling specialized vectorization by default.

### Mixed precision
Both linalg and vector support the same mixed precision semantics where inputs are converted into the output precision before computation.

Today, the casts are externalized by default and folding them back into `vector.contract` is optional.
The new path keep the mixed semantics within the op. Minor difference but might be better to give a heads up to the users before making it the default.

### Broadcasts
Now, let's look into broadcast semantics present in linalg named ops.
Many general named contraction ops (like `matmul`, `batch_matmul`, `contract`) allow to express arbitrary input broadcasts through their `indexing_maps`. As an example, consider the following matmul:
```mlir
%0 = linalg.matmul
    indexing_maps = [affine_map<(m, n, k) -> (k)>,  // broadcast LHS
                      affine_map<(m, n, k) -> (k)>, // broadcast RHS
                      affine_map<(m, n, k) -> (m, n)>]
    ins(%arg0, %arg1 : tensor<12xf32>, tensor<12xf32>)
    outs(%arg2: tensor<24x25xf32>) -> tensor<24x25xf32>
```
The `matmul` broadcasts both of its inputs. Considering the named op semantic, one would expect `LHS` to expand into `(m, k)` or `(k, m)` shape and `RHS` into `(k, n)` or (n, k)`. While all these possible combinations are valid, this choice might have implications further down the line. Currently, there is no way to encode desired broadcast variant within the op.

`vector.contract` doesn't support broadcast semantics in its indexing maps. Therefore, the missing dimensions have to be materialized before or during vectorization. Current, vectorization through multi reduction and reconstruction (transform.structured.vectorize_children_and_apply_patterns) results in the following contraction:
```mlir
%1 = vector.transfer_read %LHS : tensor<12xf32>, vector<12xf32>
%2 = vector.broadcast %1 : vector<12xf32> to vector<24x25x12xf32>
%3 = vector.transfer_read %RHS : tensor<12xf32>, vector<12xf32>
%4 = vector.transfer_read %ACC : tensor<24x25xf32>, vector<24x25xf32>
%5 = vector.contract ... kind = #vector.kind<add>} %2, %3, %4
  : vector<24x25x12xf32>, vector<12xf32> into vector<24x25xf32>
```
which is a valid realization. However, the 2D matmul semantics are lost and the computation turned it into an arbitrary contraction.

At the moment, I went with the option to bail (fall back to generic behavior) in presence of broadcasts. This keep the overall vectorization consistent between specialized lowering and generic+reconstruction. So, that makes for a fine default path as well.
Now a user has the option to create separate broadcasts before vectorization for cleaner lowering or to keep it as is.

As a compromise, a somewhat better broadcasting for contraction could be added to this specialized path.
I'd say a less surprising scheme would be to always broadcasts LHS into `(batch, m, k)` and RHS into `(batch, n, k)`. The dimension order might not be optimal but it resembles more typical contraction.
This makes specialized vs generic vectorization diverge a bit - it is fine but it might be better to give users time to adjust before making it the default.

https://github.com/llvm/llvm-project/pull/147296