<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/85691>85691</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
`castAwayContractionLeadingOneDim` introduces unnecessary transposes on outer unit dims
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
KoolJBlack
</td>
</tr>
</table>
<pre>
During IREE's mmt4d lowering, we have a vector to matrix product represented by `vector.contract` in transit of the following form:
```mlir
%result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs, %rhs, %acc : vector<1x1x8xi32>, vector<1x8x8xi32> into vector<1x8xi32>
```
Passing this through `castAwayContractionLeadingOneDim` pattern produces the following:
```mlir
%0 = vector.transpose %arg0, [1, 0, 2] : vector<1x1x8xi32> to vector<1x1x8xi32>
%1 = vector.extract %0[0] : vector<1x8xi32> from vector<1x1x8xi32>
%2 = vector.extract %arg2[0] : vector<8xi32> from vector<1x8xi32>
%3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %1, %arg1, %2 : vector<1x8xi32>, vector<1x8x8xi32> into vector<8xi32>
%4 = vector.broadcast %3 : vector<8xi32> to vector<1x8xi32>
```
The `vector.transpose` introduced adds additional leading dimensionality to the overall flow and cannot be trivially reduced further down. This is adding a challenge for subsequent patterns to process vec_x_matrix contracts properly.
The cause is the [pattern](https://github.com/llvm/llvm-project/blob/cf835b96b13bec3b5df1962bae609934edda6d55/mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp#L333) requiring the cast away dimensions to be outermost for all operands while being driven by the accumulator of the contract.
### Thoughts
In practice, transposing outer unit dimensions in this instance does not affect the underlying data layout. This transpose could be omitted before the patterns final output. Alternatively, a transpose canonicalizer that can fold transposes for outer leading unit dimensions could also apply here.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVt9v4zYM_muUF-ICW47z4yEPTdMAtx2wYej2WsgSbWsnSz5Jzo_99QPlJE2LdujdDihcm3Q-iuT3kRYh6MYirlm5YeV2IobYOr_-1Tnzy8YI-XVSOXVabwevbQOf_3h4YHwRoOviTIFxByQ74_dwQGjFHkHAHmV0HqKDTkSvj9B7pwYZwWPvMaCNqKA6AZtn46tT6Wz0QkY2z0BbiF7YoCO4GmKLUDtj3IHC1853rLhj2ZZld2yejX-d0X40ATBeegyDicCKLbyCB7bYaKvwqG3z1Ik-pJdYuRF1rS2SiRX3jC9VRhmpPF15uhaMr-ATKx7gtZ88rHig-x_E4f8b5-pJCOWWHnVEL6LzT_HU4zVVxnkvvDAGDeMJ6oMWj9RD7SyZxghftVUjLi_OpSYTK-6FUnSQBblK04YRo_TXOyElsOLu3CFW3OfH_Lg86oKfS3DjWF4doG10L13nX7xkxPj4O3HbNhBbHSC23g1NS6STIsS7gzjdn2mhnf2CQmnb_GZxqztiYS9iRG_P1MXwkoh08v8gYaJhdsvAxOjeBUy5-yb1k5Wb1Lf0QDV9vyLwMu3XiZ9D5rch8XjmPC8zVm6yN_Cv6LV33Qfw-Tv4wjf8zRDvBngTvvipmn1LZR_V1ztK_zGE_Oep8sc1mF9055vLLX-PDh-V35tNnN02sfJOKJLbpb1vceM7FD1eH1u82R1XaY3LI46CVSCUCnTRVC9hwIwSB6U7tCHZdDxRdJK22yMVG2rjDiCsAimsdREqhOj1XgtjTpDKjwrqwccWPSh3sFN4pPGix1i2AQGypbbZhgaGhzBUAb8NaONlqAQK2nsnMQRK_en4dF6UF9IHcvfozWkKrzOXYggIepxIrNycQRMdlm2MfaANyXeM7xod26GaStcxvjNmf_n3qffub5SR8V1lXMX4TtbLoqxW8yovKpRFVao6X815JXCerVbFDJUSc1WWjO_SoOM7o-l3Wy3MCPTX2EK-e6R-0KIOV-vWu54m7J9Wx63uprLvGS--FEVaYh6_DdqPg5rSCxHEQZyeG5XqVSG4IaLvXIiprtQtqpGwKsCh1QahwtRgr_do6QOD8ISUQzcYktzle-JS5eltaRkvxj94bGlRxHDr_UybgFaFRFLHhXMULp0KBqvj7YnpKybxwoYorERQDgMQo0Rdo4zpIINV1ON0aBEFGHFyQzwz6nllSDcYlQrQ6Zi-nbB2HhPElVK1JpK7IfaEcGfIKqLeozmleXWLJ6yzWgqj_0EPsRWRTLTd1PNbIRV5TO4inddJjgcTJjgQfW9O0KLHl4ydqHWhVsVKTHCdL_JsseLLLJ-0a46LOecLMVuqcp5Xq4WYFcssWy0WlayEWEz0mmd8lhX5kmd5wfMpl2oxRy4XM7WqVrOCzTLshDZTovTU-WaiQxhwvSznq3xiRIUmrMfZavEAyTnOzYlfJxlUQxPYLDM6xPCMEnU0uP7gd8J14AQYrEWStPCn2zI6-4oiYTJ4s_5upabzk6ZSfv8GAAD__w9Opxk">