<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/116815>116815</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
ymm `vpshufb` unnecessarily split into two xmm `vpshufb`s
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
The code
```c
#include<immintrin.h>
__m128i narrow_u32x16_u8(__m256i v0, __m256i v1) {
__m256i shifted = _mm256_slli_epi32(v1, 16);
__m256i blended = _mm256_blend_epi16(shifted, v0, 85);
__m256i shuffled = _mm256_shuffle_epi8(blended, _mm256_setr_epi8(
0, 4, 8, 12, 2, 6, 10, 14, -1, -1, -1, -1, -1, -1, -1, -1,
16, 20, 24, 28, 18, 22, 26, 30, -1, -1, -1, -1, -1, -1, -1, -1
));
__m256i permuted = _mm256_permutevar8x32_epi32(shuffled, _mm256_setr_epi32(0, 4, 1, 5, 7, 7, 7, 7));
return _mm256_castsi256_si128(permuted);
}
```
via `-O3 -march=haswell` produces:
```asm
narrow_u32x16_u8:
vpslld ymm1, ymm1, 16
vpblendw ymm0, ymm0, ymm1, 170
vextracti128 xmm1, ymm0, 1
vmovq xmm2, qword ptr [rip + .LCPI0_1]
vpshufb xmm1, xmm1, xmm2
vpshufb xmm0, xmm0, xmm2
vpunpckldq xmm0, xmm0, xmm1
vzeroupper
ret
```
Those `vpshufb`s operate on the separate halves of the result of `vpblendw`; it would be better to do a single `vpshufb` before splitting into lanes.
https://godbolt.org/z/fYcM4r4Y4
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVc1u4zYQfhr6MohBDWVZPuiwSWqgQIv2sJc9GZQ4tthSokJSdpKnL0hJyUZJD2sYFDj8fuaHgqT3-tITVWx3z3aPGzmG1rpKvUrdyU1t1Uv1vSVorCLGHxn_xgo-_Zt5j0L3jRkVMfGgu073wel-2zLx2wQ4nboMSw29dM7eTqPA56w4jSXD8nTqcFdouHKGD_C2yxgegO3vJz4AvB35Vp8DKWDiEU5djJ28MfpEgxbIsIzUB8gKhgcmFv5Crg316iM5hSI7UspZPUpMGZW7D0IfEhnPZ7PKZIpFuVjcbJcqmwEU3HL6Lhl_yQ0gT6apBEyRaS1SJGGyBLnLfmn96JUlOUxymORwskwrJktMGMF_2ezNKTbuf3o3kOvG1RTn2FW68lng2zyXNn_RxHSeEkw1pAR2cdl_Wj5l4iiMrl8UG-mD10lbZ7EX5ZLhzzy2f1y9ANP2qiWwgt_9JeCuk65pmXhspb-RMazgMDirxoY8E99WdOm7KfLpxVigy8SugzdGAbx0XSpzeWbFGpfu3G3Zv3Qdn_H8A2_PV0R6Dk42IZYPz-8u05VbYTt7fQKIsHRVnm7WKRiCA7a7d3oAhvew_ePh79_5KWO7uWmxhnY812_qPz3xU7ULks8IvkJeh7Efmn-NepooX0DXWb-Ss-MwkJvjjsKX4_zeWk9xoHMarOAe7EBOBgLbQ2gJPA0y7VtpruTBnlPYkR9NiLtEn2YRhcU96AA3OxoFNUFNIZCDYEFZkOB1fzErS6jpbB2BH4wOQfcX0H2wYGRPfjvnndY2hCHdLTwyPF6sqq0JW-suDI-vDI_nH82fuct_5BtVCXUQB7mhKtsL5FiUe9y0FRf8gLJR-4PancX-nJVCHkiWtRQll7tioyvkmGdZdshykQu-PZyVyjErOeZSNRxZzqmT2myNuXbRe6O9H6nKsqLMdhsjazI-fV8Qe7pBOmWI8XPjqki6q8eLZzk32gf_LhN0MFS9dN2qOWPfU0PeS6fNy9SjqT_hZuPsV-PbjM5Uq0bp0I71trEdw2P0mx93g7P_UBMYHlOWnuFxLuNa4X8BAAD__yiYBV0">