<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/121823>121823</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Deoptimization of shuffle intrinsics
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
SEt-t
</td>
</tr>
</table>
<pre>
Godbolt: https://godbolt.org/z/Knofd3WoM
Combination of vpshufb (partially zeroing) and vpermd deoptimized into vpshufb+vpermd+vpblendd adding unnecessary instruction and wasting one ymm to store zero.
Also, loop instructions produced here are worse than gcc and msvc.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJxcks-OnDwQxJ_GXFqL_AeY4cBhdvfjO0Q55ZBcjd2AI-NGtpnRzNNHMLtJlBNI3V1Vqp91Sm4KiB2rX1n9XugtzxS7b__ll1wMZO_d_2QH8pmpC8w5r4mpC5M9k_30HJQUJyb7B5P9l0CjVd_pK-MXxi9vtAwu6OwoAI1wXdO8jQMweV51zE57f4cHRnJhYrIFHSxcV4yLBYu0Zre4B1pwIdPnLZOvz43jZ_AYrAVtrQsTbCGgwZR0vIMLKcfNHM677E2nvO9QQLgvC2SClCniYV8-0158IibfwBOtfwskWCPZzaCFGSOCjgg3igkhzzrAZMxhsaSrKQvbKduqVhfYiZNqRKu4aIq5q5U1Qp5qoY1uBa-00lhVNVeyOamxHgvXSS5rLngj2pqLpmyasRJ6PCnLz2psz6ziuGjnS--vy9554VLasBNSnKUqvB7Qp4OilAFvcEyZlDvU2O1HL8M2JVZx71JOf2Syy_7A_-PcsPod3j-7_w1ur370uJOILiRnUrFF3_3zGlyet6E0tDDZ7-Ifn5c10k80mcn-iJSY7D8yXzv5KwAA__-WTtTN">