<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/90416>90416</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[mlir][Aarch64] Improve i8mm instruction sequence for `vector.contract`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
mlir,
mlir:neon
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
KoolJBlack,
banach-space
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dcaballe
</td>
</tr>
</table>
<pre>
The i8mm lowering for some `vector.contract` ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get:
```
1470: 6e180483 mov v3.d[1], v4.d[0]
1474: 4e006204 tbl v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b
1478: 4e84a462 smmla v2.4s, v3.16b, v4.16b
147c: 6e024041 ext v1.16b, v2.16b, v2.16b, #0x8
```
It calls my attention the `mov` instruction, esp. the indexing from `1` to `0`, the `tbl` and the `ext` instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of `mov` and `ext` instructions all over the place.
We should investigate what is going on and try to fix the problem. My suspicion is that this [zero initialization and insertion](https://github.com/llvm/llvm-project/blob/aafed3408e7269c42f974189198a47eb6dd2fc84/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp#L178-L185) for `vecmat` cases might be behind some of these instructions. We should try if using `llvm.undef` fixes part of the problem.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVVuv4yYQ_jXkZbQWxviShzyci6Jue7aq1CPtM8bjmC4XF7CT7K-vwDnnpNt9aRTZYOCbb2Y-ZkQI6mQRD6R-JIz95pz-9VEL-Y0wRtgTYawXVsjpU5iFxPSxft6JJU7OHwYpeqE17no3XA-vE4LqjAHtzuiVPcHoPARnEEhDV5TR-UI6G72QkTQU3BxABZCL92ijvsK4WBmVs0LrK0jnPcpYwC_ujCt6wp5gRj86b4SVCGcVEOKEHhNINuOdM9moMrN3Kxq0sYAX574lNiKm7XBCi15EHECEbbewYEzkA5xVnIBd2KUDZS1640KEqDRCUN8xJAbnBBBJ9UDoM6Fvz4be_nkKAFDylpLqARosO8q7CvLPuDW_16oYSP1Ykvo5ga48Tympn-FfEDxBcKS0YZRvELHXGwQvyqZPp0n7CGvZvE3Xsv0Ydh_DfRoCaTeLdJttxkredpuljgvesM1SMEaLZIkVPPu-Vu9o_P74xlVu7lLGKS83BLzEjWv5fpD9d0RYRS8d_DSU2_NzBCm0DmCuIGJEm0SSk0kaatyaxKRsiH7J8kmgGOYi71B2wEsWo3cm7S_T7ujSMNthT29IsddpTdjh7Qte4g_YoYDXSQUw4grWRQiIBgT06gQDCg39sqls9q7XaJI0PWY940VI9H1W3nlCC0tItLTwJ_R3KivgK4LrA_oVh20ZAv69oJUYwI13PiemP2cJQmtwawJOZLSQWNxH9CtCmNyiB1B2xRDVSUSE8yRiYnxyiZmzWyj8NYVrVJd7xwr4coWwhFnJlAsVIE75gqkApH78jt6BsioqodV3kfOVwJQN6HOOkvK7KcY5pMvEjoQdTypOS19IZwg7ar2-vT7N3v2FMhJ27LXrCTsKMeJQcdphy5q95Gzct7zs9uW-E7zFvhkGNsqOE3Y0WvmEo9K5ZyX0BvTgze-YpHJ89cKGVFUCYceXVLqebjVKOfvq_vzy5eXhj6Q6bws5z4RVL2XbfXopu5qwfS4gW3kzIqdBioABjDpNEXqEHidlh60-uTGFMOAPgvpIRoq1Gm_KIA1N3heLHXBMyKO6YIBZ-HhDek_GbjhUw77aix0eyrbkVdV1XbmbDk1PO1rzUkhZ47DntMKW0krWg5QtY3ynDizfWNaxsuS8LAQfRSm6XvadoFwOhFM0Qukic3H-tFMhLHjYU142Oy161OHWO7ZQ37pGnlQPNgc5Nw1_yMnsl1MgnGoVYvgAjSrq3IPyufqZ1I8Pwsup4akoft7q-dZe7oL3fjHu0vBDl9ktXh_-t86yj0kQ2c1_AgAA__-J-j2p">