<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/56574>56574</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal codegen in vector shuffle
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
tellowkrinkle
</td>
</tr>
</table>
<pre>
In the following compiler explorer test: https://gcc.godbolt.org/z/5756crha1
Clang generates the following:
```
vpunpcklwd xmm1, xmm0, xmm0 # xmm1 = xmm0[0,0,1,1,2,2,3,3]
vpshuflw xmm2, xmm0, 80 # xmm2 = xmm0[0,0,1,1,4,5,6,7]
vpunpckhwd xmm0, xmm0, xmm0 # xmm0 = xmm0[4,4,5,5,6,6,7,7]
vpshufd xmm2, xmm2, 64 # xmm2 = xmm2[0,0,0,1]
```
The `vpshuflw` is completely redundant, as the `vpunpcklwd` above generates the exact same thing.
Interestingly, this weird codegen only happens if the result of the `vpshufd` is first `or`'d with another value. Writing it out immediately without the `or` has no problems, as can be seen in the `without_or` function.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyFVMFunDAQ_Rq4WEWsgSUcOKSJKuXcSj1WBo_BjbGRbXaTfn3HBhK6ihrEYMaeeW-ePdAZ_to-aeJHIMIoZa5SD6Q30ywVWAIvszIWXzw4nxT3ZPR-dviS0G94D32fDYZ3RvnM2AFn_qBVdXXu7chOSf6Y5Pfr80ExBB5Ag2UI9i9hAFxjz_l2R5ds12Ve9Nw_qytf_ZdpOiX0IYz5Pu6xCS3iOkmKxzWg-hqCgp02o5sV0arHWzY3LkJddx9B6JHt7o3r_dpY6f9YS7QK7YxWf8AaNY4HjfknGvMjW3lg2FlWpo_ZgkZ-1BbHc_m5NnrQturb0W-O7weeMTr7fuIrkS42lwIP6pVY4IvmTPtAzdamiAn7cYcU1pkL3HQOvLDeE8cmQBcbKDu22pP2YLFfcV69BmQMceQK0nIk54BQxGikH9k8g3ZEigiKOYvyxIj3OuImbXULaZ0P08YGhbTm5Cr9SJg2GG_JhakFMkJ-WhmoiUSoxRM5TcAli3pDfJjb8CMQVuGINmS2plMwuW0reqZJB8QBFiv1nrEB_FozxaJ7L43OUt4WvCkalnrpFbTfl87MXk5MvQlGjAv03lgSNAkF6WJVe_M5I_rSZXhA6Ch12YcvWNtvTEZXOreACx_5uarLdGyrinanuyKvy0L0lWC0B1E3XSMKwZuOilSxDpRrsWewTVLZ0pzSvD6d6QkBmox1RdGcSppXfcGZoEmZw8SkygJx-Kmkto01dMvgcFFJ5937InNODhpgx2cLbpBtcbfxz_JspX5GpbHoNlb8Fz0Iehc">