<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/116815>116815</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            ymm `vpshufb` unnecessarily split into two xmm `vpshufb`s
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dzaima
      </td>
    </tr>
</table>

<pre>
    The code
```c
#include<immintrin.h>
__m128i narrow_u32x16_u8(__m256i v0, __m256i v1) {
    __m256i shifted = _mm256_slli_epi32(v1, 16);
 __m256i blended = _mm256_blend_epi16(shifted, v0, 85);
    __m256i shuffled = _mm256_shuffle_epi8(blended, _mm256_setr_epi8(
        0,   4, 8, 12,  2,  6, 10, 14, -1, -1, -1, -1, -1, -1, -1, -1,
        16, 20, 24, 28, 18, 22, 26, 30, -1, -1, -1, -1, -1, -1, -1, -1
    ));
    __m256i permuted = _mm256_permutevar8x32_epi32(shuffled, _mm256_setr_epi32(0, 4, 1, 5, 7, 7, 7, 7));
    return _mm256_castsi256_si128(permuted);
}
```
via `-O3 -march=haswell` produces:
```asm
narrow_u32x16_u8:
        vpslld  ymm1, ymm1, 16
        vpblendw        ymm0, ymm0, ymm1, 170
        vextracti128 xmm1, ymm0, 1
        vmovq   xmm2, qword ptr [rip + .LCPI0_1]
 vpshufb xmm1, xmm1, xmm2
        vpshufb xmm0, xmm0, xmm2
 vpunpckldq      xmm0, xmm0, xmm1
        vzeroupper
 ret
```
Those `vpshufb`s operate on the separate halves of the result of `vpblendw`; it would be better to do a single `vpshufb` before splitting into lanes.

https://godbolt.org/z/fYcM4r4Y4
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVc1u4zYQfhr6MohBDWVZPuiwSWqgQIv2sJc9GZQ4tthSokJSdpKnL0hJyUZJD2sYFDj8fuaHgqT3-tITVWx3z3aPGzmG1rpKvUrdyU1t1Uv1vSVorCLGHxn_xgo-_Zt5j0L3jRkVMfGgu073wel-2zLx2wQ4nboMSw29dM7eTqPA56w4jSXD8nTqcFdouHKGD_C2yxgegO3vJz4AvB35Vp8DKWDiEU5djJ28MfpEgxbIsIzUB8gKhgcmFv5Crg316iM5hSI7UspZPUpMGZW7D0IfEhnPZ7PKZIpFuVjcbJcqmwEU3HL6Lhl_yQ0gT6apBEyRaS1SJGGyBLnLfmn96JUlOUxymORwskwrJktMGMF_2ezNKTbuf3o3kOvG1RTn2FW68lng2zyXNn_RxHSeEkw1pAR2cdl_Wj5l4iiMrl8UG-mD10lbZ7EX5ZLhzzy2f1y9ANP2qiWwgt_9JeCuk65pmXhspb-RMazgMDirxoY8E99WdOm7KfLpxVigy8SugzdGAbx0XSpzeWbFGpfu3G3Zv3Qdn_H8A2_PV0R6Dk42IZYPz-8u05VbYTt7fQKIsHRVnm7WKRiCA7a7d3oAhvew_ePh79_5KWO7uWmxhnY812_qPz3xU7ULks8IvkJeh7Efmn-NepooX0DXWb-Ss-MwkJvjjsKX4_zeWk9xoHMarOAe7EBOBgLbQ2gJPA0y7VtpruTBnlPYkR9NiLtEn2YRhcU96AA3OxoFNUFNIZCDYEFZkOB1fzErS6jpbB2BH4wOQfcX0H2wYGRPfjvnndY2hCHdLTwyPF6sqq0JW-suDI-vDI_nH82fuct_5BtVCXUQB7mhKtsL5FiUe9y0FRf8gLJR-4PancX-nJVCHkiWtRQll7tioyvkmGdZdshykQu-PZyVyjErOeZSNRxZzqmT2myNuXbRe6O9H6nKsqLMdhsjazI-fV8Qe7pBOmWI8XPjqki6q8eLZzk32gf_LhN0MFS9dN2qOWPfU0PeS6fNy9SjqT_hZuPsV-PbjM5Uq0bp0I71trEdw2P0mx93g7P_UBMYHlOWnuFxLuNa4X8BAAD__yiYBV0">