<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55646>55646</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
_mm256_permute4x64_epi64 generates vpermpd instead of vpermq
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
moon-chilled
</td>
</tr>
</table>
<pre>
The following function:
```c
__m256i f(__m256i x) { return _mm256_permute4x64_epi64(x,0x4e); }
```
Compiles to:
```asm
vpermpd ymm0,ymm0,0x4e
```
It should generate vpermq instead. The use of __m256i is a hint that 'x' is being used for integer operations, so the use of a floating-point instruction is more likely to trigger a penalty for switching domains.
GCC generates vpermq for this snippet, and will generate vpermpd instead if __m256d and _mm256_permute4x64_pd are used; this is ideal behaviour.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx1UkmvmzAQ_jVwGQX5mSXJgUOTKFXvvUcGD-DW2NQ2Wf59x-TlNYoayQJv823jxspb_XNA6KzW9qJMD91s2qCsSfJvCTsk7PGt2H209_XpNPKyUtAlfPOYXxO-hWS9A4dhdgZOY9w_TejGOWBxrYoTTqoqqISu7tm1QKpI8h0VHV5onrn3dpyURg_BvlMl_HjfOUe2ScJtHBlxfP4WqvcEPwL4wc5aQo8GnQgIC84fUMYHFDIDiCnNHsF28PCrPAgYlAkQBhEg4WuytY7bDcYo6bqkZB2hBOzRgZ0iOIXrSRN4S3VfoAI6benQ9KvJRsxI7ealFxFytA5Bq9-ob5QDBKf6CClgQiN0uC1E_qJCO0RuaUdBCNmzze_7_ZdB_3AYy8JABN6oacIQlQkj4aK0fomDYv3MA9QjBblc_k-n6bJwizsZO7xQxCFRaMpnEGdlZ5elss7lNt-KNKigsX73Zl6FP2mh8O5e0tnpeghh8vGZ8CONXoVhbrLWjrTQ-vz4rSZnf2FLbo_K-xmpIceyrIoqHeqqQbEtxEfVybJs11teoCixqIquKBu2WadaNKh9nZS7hHODF1ggaJ6Uh1TVnHHOSp6zNWcszzpWdd2mkZXIP2TFmqRgSL3RWdSRWdenrl4kNXPv6VArH_y_Q-G96g3iQkf4Yg6DdfVorVlRq7VGmS789aL_L0x0Om0">