<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/162812>162812</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [x86] Using `select` instead of `pblendvb` leads to very poor codegen on AVX2
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Sp00ph
      </td>
    </tr>
</table>

<pre>
    I tried this code:

```ll
define <32 x i8> @cond_double_blendv(<32 x i8> %a, <32 x i8> %mask) {
    %aa = add <32 x i8> %a, %a
    %ret = call <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8> %a, <32 x i8> %aa, <32 x i8> %mask)
    ret <32 x i8> %ret
}

define <32 x i8> @cond_double_select(<32 x i8> %a, <32 x i8> %mask) {
    %aa = add <32 x i8> %a, %a
    %bitmask = icmp slt <32 x i8> %mask, splat (i8 0)
    %ret = select <32 x i1> %bitmask, <32 x i8> %aa, <32 x i8> %a
    ret <32 x i8> %ret
}
```

Both functions have the same behavior, doubling lanes of `%a` if the MSB of the corresponding lane in `%b` is set. However, they generate wildly different assembly (using clang 21.1.0 with `-O3 -march=x86-64-v3`):

```asm
cond_double_blendv:
        vpaddb  ymm2, ymm0, ymm0
        vpblendvb       ymm0, ymm0, ymm2, ymm1
 ret

.LCPI1_0:
        .zero   32,252
.LCPI1_1:
        .zero 32,32
cond_double_select:
        vpsllw  ymm2, ymm0, 2
        vpand ymm2, ymm2, ymmword ptr [rip + .LCPI1_0]
        vpsrlw  ymm1, ymm1, 2
 vpand   ymm1, ymm1, ymmword ptr [rip + .LCPI1_1]
        vpaddb  ymm1, ymm1, ymm1
        vpblendvb       ymm0, ymm0, ymm2, ymm1
        vpaddb ymm2, ymm0, ymm0
        vpaddb  ymm1, ymm1, ymm1
        vpblendvb ymm0, ymm0, ymm2, ymm1
        ret
```

The version using the `@llvm.x86.avx2.pblendvb` intrinsic emits the expected assembly. After staring at the version using `select` for a while, I can say that all instructions except `vpaddb ymm2, ymm0, ymm0` and the last `vpblendvb` form an elaborate no-op. I have no idea however what the code generator's intent was with these instructions. I don't see any reason why these functions should not just emit the exact same assembly.

Note: I originally encountered this while using AVX2 intrinsics in Rust, where the output from rustc was much worse than the output from clang for an equivalent function, with the difference being that rustc lowers `_mm256_blendv_epi8` to `icmp slt + select`, whereas clang lowers it to `call @llvm.x86.avx2.pblendvb`.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vt1u6zYMfhrlhoghK7XjXOQiaVeswP6wsw27K2SbjnUmS54kx8mefpCsJE1y1vVgwAoDaiCSH_nxR-TWip1CXJNsS7KnGR9cq836U09p385KXR_XL-CMwBpcKyxUukay2BAavpxOn5SEbmpshEIgi8cFgwOIgiy-AfJAK63q11oPpcTXUqKq94QVN1Is44Q93uqyrOP2D8JWQJZbQjcAEEQ5kMUT8Lq-V5jM-PMsbtAF-YpLee-dlPsuORR5wvcHlvSTg-XHPeTvOh69mFy4ETHoPIfLp4nMj9BnUWLl_l_6SuG8maAjqq4HK78QzIT0CLaX3AFhhSiAXgh4k4YphouFNFqIOF9DMv8wvadCnajeatdCM6jKCa0stHyP4FoEyzuEElu-F9p4xEC7UDuQXKEF3YC34YFzCqIJSt9_2voL_2-ljUHba1WfdECoqFIGFQsWXQLf6hH3GCBci0fYoULDHcIoZC2PUIumQYPKAbcWu1IePaWD9WYrydUOWJqkCYVRuNYDzH9cwLzjpmrJ4ulQ5PP8Yb5fBOTVfbty2xG6-UJfBkmIf_ue13UJcOw65j09dh09n2_FYs_E31di03lST71azAvdJN89_vSSvtJr1OQvNBoAFl6JZewimH5JMIgt2E04sU9uwrFSjvfhsOuQVf1W4nSO2tTQOwMk2xrRA2FbOPufPV3DmAiTnsM-wUz27y_fR0hvEc55uTWS_oe8XFv_16R_pQ8fRI_Fcd2uv7QIezRWaAVTD_hm89f_OL59rylnhLKiAuyEs0EHDz1WDutzWyWwaRwasI4bb5i7IHeNRnIaCyqn0GgDHMZWSPT-v0DFFVh-BNdyB_6JEco6M8TZgocKe-dNvEdsTsEXhoeW3EbxN7E02nTAFaDkpQ6DQum57hN4mYaX0iBq5NBOcwXGNgbin-vTdPEjbWk9L36wjNxO08O1aPHKaW-21oqwpQOLCFwdwSC3WsHYHqPCZXzaVg-yBqUdfB6sC3RHtnnlpqF65nvK6A_a-S0CXkAbsROKS3kEVJUelENz2jUCyTEHm99-Z5eU-ijg58E6z-DYopnmtx5cPzhojO7ADNZVIcpuqFoYtbFeiKs7yWmghrwqwD8HsefSM3SKMGBEps6TufIPxVSK3EUwqUc01ifvtetYlsep-oq9KHwWnfZ3lzeUbeFcV-dAuI0ORWuey6A3rS_vVHwyq9eLerVY8Rmu02VOaVGk2WrWrjnPmyrHiuUNa6qUZnleYbFc0Yw3NSvTmVgzyrKUppSuGFusEppyyoscy7RpHh6WnDxQ7LiQSUDXZjcT1g64TnNWpGwmeYnShh2SMYUjhFvCmF8pzdorzcthZ737wjp7MeOEk2H5PBQ5yZ7g1_uO86WJvI6v79u2kMhr6-nZozlCr7UJBb9DBVqFkpkNRq5b53rrHwP2TNjzTrh2KJNKd4Q9ez_iMe-N_hy2q-fgvSXsOYa3X7O_AwAA__9kAly2">