<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55170>55170</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[x86] improve cost model for oversized shuffles
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
rotateright
</td>
</tr>
</table>
<pre>
I was trying some examples with https://reviews.llvm.org/D123494 and noticed that AArch64 seems smarter about decomposing shuffle costs via mask:
```
define void @cross_talk(<8 x i32> %a, <8 x i32> %b) {
%s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>
ret void
}
```
If we don't care about element order, that can be turned into the much simpler (especially for a 128-bit vector target):
```
define void @identity_and_splat(<8 x i32> %a, <8 x i32> %b) {
%s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>
ret void
}
```
That transform happens with AArch64, but that doesn't happen with x86 because:
```
% opt -mtriple=x86_64 -passes="print<cost-model>" -disable-output shufcost.ll
Printing analysis 'Cost Model Analysis' for function 'cross_talk':
Cost Model: Found an estimated cost of 12 for instruction: %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 8, i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8>
Printing analysis 'Cost Model Analysis' for function 'identity_and_splat':
Cost Model: Found an estimated cost of 12 for instruction: %s = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 8, i32 8, i32 8>
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzVVU1v2zAM_TXOhUhhy3HjHHxImxboYcAOuxeyRMdaZcsQ5bTZrx_lON1Hu2FYd1gBw6ZoiqLe45Nqp4_VHTxKguCPpt8DuQ4Bn2Q3WCR4NKGFNoSBknybiFt-PB4MPtKFtYfuwvk9u3aZyFebFcheQ--CUaghtDLAdutVe7kCQuwIqJM-oAdZuzGARuW6wdG0aDs2jUVQjgLBwUjoJD3wkpCkuyTdzu_LdH6mocbG9AgHZzQkq1R5R3QfpH1IRJnk1yU8gclFkt9AIgqZiGt44a0TsYFkfXVKCNFFHLU7F3RAFZx_Oe-X2V5482u2oIx_opGejexsiLORn43yFSO_OZfoMUxbniFZ736D0F0Djwja9YlYB1DS44w9WuywD-C8Rh9XmdhSsocaIYy-ZwJNHxz7EbpRtUAmNgRDIUqkAZWR1h6hYXAkZKJc1obLOqEVpN9jYGRjy_wxfUZzQSYc77mJ7mmwMvx_NP4te2-k8fT-FCkKXvbEqHfQymHAflborLO4UM30Tmxqh3Qi_hR6inwqL5ljJUfCZ3rg5_VEAW4IsOyCN8w6I8nT7lnHy0ESIZ8Fu0SIwXOLMDJRtMvOabRxf0LAUhuStcUlt9rA5UQWYhAfGbOeP8apUfiyl_ZIhtkS62sOgQ8xD2xnN3unHmvGXgXj-hj2vc7Xz3v4NjmeGrdu5KOI2xkpmE4GbudYALiGm3XKaHoKfpySxgnvQPpvR-41ib0DBP-Z6p7bfKGrXG_yjVwwHBarpLjiBk-KHfAp593hdBHB1NPTXtnlyXxhEOa90WL0tvrxYtyzvsb6gm81HsTbcf4sOeVnxoKHhmhk-YjbosjW6aKtVFMUIkOFuMl1jetcbFCXeaYRm8t1uVlYWaOlWCGXtzCVSIVIV6LMVnm6Wl1kKquVKnOZ5qhUVvBBip009vl2XvhqqqEe98Q_reEL9ttPVrPZ94jn_HIMrfOVd4EZ92bfhsVUcjXV-xVh8lyW">