<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/62280>62280</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
x86: Inefficient code generated for `__builtin_shufflevector` on 512-bit input.
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
He3lixxx
</td>
</tr>
</table>
<pre>
This code snippet:
```c++
#include <cstddef>
#include <array>
using ByteVecT __attribute__((vector_size(64))) = unsigned char;
void foo(const std::byte* input, std::byte* output) {
alignas(64) const std::array<unsigned char, 64> SHUFFLE_MASK = {1, 2, 3, 4, 5, 6, 7};
const auto shuffle_mask = *reinterpret_cast<const ByteVecT*>(SHUFFLE_MASK.data());
const auto input_vec = *reinterpret_cast<const ByteVecT*>(input);
*reinterpret_cast<ByteVecT*>(output) = __builtin_shufflevector(input_vec, shuffle_mask);
}
```
compiled with `clang++ -std=c++20 -O3 -march=rocketlake` produces ~300 instructions that spill the input data to the stack 64 times and performs the shuffle element-at-a-time using vector extract and insert operations ([godbolt](https://godbolt.org/z/brKz9voTd)).
If you remove one element from `SHUFFLE_MASK`, the generated code is okay. If `shuffle_mask` is initialized as `const ByteVecT shuffle_mask = {1, 2, 3, 4, 5, 6, 7};`, the generated code is also okay. Here, `InstrCombinePass` combines the `extractelement` and `insertelement` instructions to a `shufflevector` instruction.
Note: In my initial buggy code, I intended the compiler to use AVX512 instructions and 512-bit registers, just as if I was using `__m512` types. Clang still uses 256-bit `ymm` registers due to the default `mprefer-vector-width=256`. With `-mprefer-vector-width=512`, the above examples produce good assembly.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVUGP2zYT_TX0ZWBDJiVZPvjgtWNkke9rCyRNexMoaWQzS5ECOdpd76G_vSAl79rJpmgBgYY1nOGb995Q0nt1NIgblt2xbD-TA52s23xEodXz8_Osss158-WkPNS2QfBG9T0SE1uW7FmyZXkyPjXjd-EZ33KhTK2HBoGJXe2pabBl4sN7UemcPL_F4jp4ZY5wdyb8ivUXKEtJ5FQ1EJYl4wXjxSPWZF3p1QsyXuQp4-vxASb2MJjYVAP1STom7q5rP1rVQGst40VtjSfw1IR2xLY6EzK-BWX6gRjf_RixA8XQGthqKgoAILU6GukvQOC7ulOHu1tUfAd5ysQH-Pzx98Phfx_K_28_f4ro2epuGcI8LCIsaViymBKWFVvtX7uaTpMDWfCnoW01lp30D2MpvnWoDKHrHVJZS09BkJhxoZfxbaCfF9dAFo0kGamOvN5SGHq-OjXyVT5i_d-PnKh-O-An6T8kXikh9lCW1aA0KVNODIz2uBwQsEU9r-i56Wq1_87N49_adr3S2MCTohMEk2tpjqPRYR713U--5wnMfxUw76SrT0zsna0fkLR8QJYn0DvbDDV6-EskCSjjyQ01KWs80EkS-F5pDXTCkUsI5APZ-MaTrB8gT4FUhx6kaaBH11rX-TE-NgWosUNDc0lzOQ97YZyikQrAZ3KyppivjEdHYHt0ckQRlM7ujraprCaW7RkvTkS9DwbmB8YPU2hh3ZHxwwvjh8p9elk_2i_NaJHFtUHuWzjbARx29hHBmld00DrbBSavvRb45rvYzBFNwBRmJNw2yoN9kOcF3Lch6Ua_PAlhZRQpqdULNiB91OjGae-MxL-drn9CJbW3E7SP6DBsZHlyH4Td2a5SBn-T3geM9fh31IrlyaTDxEfYERRheTKKcvX-1iYW5BUFk79vd91I8IslZGIL9wa684UmqIbj8Ry7CJDvIQyaabCJ6Ca7u3DY4BG2X__MlvwWRwCbLfm8UgQOj8oTOh9qfRvCdeBBtXAPT9JP7mN5UpZdtuQBK5179AvYhSkCT8Hzg0cPPMtjQZYn564LO19LQzPgZRIabOWg47aud9iim488zJ9UQ2HqeJazPFnAH9O8zn-yb8RzkVdWwaX4LLteo78MKxytDZ7y2FX6vJg1G9GsxVrOcLPMC54UGU_F7LQp8qpdFVWeLsU6xbVo2jTJq1amjcwrFHKmNjzhIkn5cimyNBELnolije2yWmWtwHXG0gQ7qfRC68cuTNhMeT_gJue8SGZaVqh9_DxzbvAJYpBxHr7WbhNy5tVw9CxNtPLk36qQIo2b5yIffYBtq2oVhjC6-M3UrXWjTu9fonkC1ryKHi-oxWxwevPdDaHoNFSL2naMHwKG6WfeO_sNa2L8EJF7xg-xs78DAAD__0x8sFY">