<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=http://email.email.llvm.org/c/eJzFWVlv2zgQ_jXKC2FDhw_5wQ9psgUCbIvFosAC-2JQIm2zoUiFpBwnv35nKMmXZCduum3g6OAMh3N8MxzamWYv8weitBM5Z8StOVlqKfWzUCuSa8bJiituqBNaEWFtxcnzmivizAty_F1ZF8RTSxR_JsH4U6mNo5nkCysKFozvgzhdO1faILkN4s_wWQm3rrJhrgt4MTB7IKlawXM7c-BnxjMiRWaoeQniO1JZXAu5iRKrtZMvJKuEZEPyDfWtpCR8S4tScpJTRTJOLAcdhfL2gForzTItHRhUlEJyA-yl1IabPhVxnWEzY6gNKvcK__TLl8ws0-0TKDcMwvsgvK2vqIM2YiUUlbWS3m_CEqa5JaAQoYwJ70G9JO5Zk2ASiiSGK9nw3Glj0UrU1fJcK0Z0CS6Hu13rSjI0qKD2EeJTewJEwtwqRQGZcEg70gjG6w9q0wzFSRBH4Iolp64yHKwuJXVLbYqFUM4IZUUOaszQI2dnHEX3gLeyoLpj6MLk1hP9UxDD9dOhZnzruAHt47hdfbBbHQZJMG3YCfyVVQaL5gA-joBYKoKiF5ZLcFqQ3H1Br30Lkj-Aq4DliB-g-PQNn7LmaUYGwISc7xW-aLz6kUWC6f2h5bAeLrI0umilLwAD2xTk7la7JRDUVpSnHnukUpYu-fEY_vWpvhcLWtYr1eFBzwN1EGEE-2mhJ832i-yN6bUK8L2oEbqzyXuoeWm8tH_7qL0UEPSJoIlwmVx26tuWtNlynNSQv025sMTpk8oo5QaQa1CFOiVh-uAO8hbqGd9wGST3CRkMeCEAqPcNNy4BqiS3vbmKTPUQ40uhONlowUgwChf_fp025S1Ku6ZG03XM02i0DLNJvkyjiHLEKkA3JVt0qsfuLZR4KgWFSq1zWmJCE2u463LOCOOGL-Ff5RyzHViABONBPA49YFJ83PsXwqRoAbEHGKBLkhAuEctWeJscRg-mRQQcggjIKVTKI1Ho5laZyOtydyAoCg-kxF6KhXJyNKNe4EBMY1O_HAuVlx-zetme_9R9rfHgxJUiSXxOKPjUR-6EHvWmz4VIT05zCuI8Ho0nPBrReDaOKRtH_3Oc3yHacMq0gv34giz6E2VlV-Mvik8AuBh57EhNWQ9MesJOL4Q9ORQ8uVJwdkHw6KekzLgnZXCT6CZNR8faou7wKzdaKOhmQHF4PllwerBg4heEmJxLMYzF8fx4_EZuJj-QmyC0fveAG9ocuis_PrucsyDnHRvFP9DQYkNGN9sxYA3AjJgljpoVZl3d1jU9NDRwvjWUWj9astKadRnqhtD3ezDqLO4pogBXM0Kf6QvBttBw2KexXc_gvW0NPe52m9Ej9oaGr4SFhsv2d4fUFm1X5mtO3QN09vJ2v2o330Jv_N3QLapvmDimPwIDw7sHFc-3x-QNkp9oEjfvL0URYooC-_QeH17R4TATCM_aMFI6gz28Ydtdv7mTVYK2rJbRTAn7plrRnVqr0V3Fs7ayjqcg9qsSevPjcSyvbwEE6pwaNBjZY6M9W2FPfxh5COdJ7BUcz_DYAx8DgcZKgLMQO20pQGRgx0INUoQjdEWhsYY7DMOdKjyZKAeD7VQ4e8FBRTEQ_hZGhn_e_fUQLsIOHIZSgzRCov7huH941D-cnpE96R9PzkifnBEfxWkP4nvax2sxv2kwv20AyO0pQ7mLW9aiHhm3XZTtgRn1YVmUvvndxaMnJxQjvTkRnXLmRcmfWPv-jhm_KmnwTN0kwu7s3ByGHX79ACfiusxq8h1P3HgA9edn6APqGotOAA7V1sSzVXZIHiA5oPfAlGk5LHmqhMPc4MulyAVXzanerakjQIfk5DRfY_K13zO0eYj5tBGsolLW5RqSkBuYsKGyAslw4K2PE7uUb-v0uaPBLgl_pFyXlV230L0K0y2dA6GvkNu18XcmJNLHndmvW5Q-85scMh0f4VSNu4Z-AjTFVztqv9Ipu6iUSX13NepXqjEJmXqVauhnlOJvZn-HoQQMQL5sD7MeFeg1_krHTy7aeNbx19p4xoRGTPwhE6bXmfCjSibXKflxNES9aLjGOb84tFHjtejAax9DZxcYv8GEuF_FXPajotYw92UxP6dgTT6nYKfrPadgLeZEh029ZQjoV_z70Y58gOvOhn526_8NnTTdnm0KSl36u-m46d3dwg2bJ2yWzOgNrdxam_l3uFjYgZW6qYycX_jZwX_TVt8GpdHf8dvk-LP_ccPCwziZgvj1nE6nbJomk4RNUjqJw_E0S8PleDJKZuPJNAxvJM24tHOwNYhj_AHEi4BnsPvm4xqIeRzG8ImiKBlPR-lwGs_iPKKzJJvQKEyjYBTyggo5RDn4S8WNmXuRWbWyQJTQVNg9kVoLx2POvcKgoRNO8vnDaYezhIbk61ZEbf_lf4vABgyOI6p7jLnxWs-9yv8BtodXSg>53760</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Inefficient code for Nxi1 masked operations on non-avx512 target
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jhorstmann
      </td>
    </tr>
</table>

<pre>
    I noticed the following code generation issue when trying Rust's new [portable_simd](https://github.com/rust-lang/portable-simd) library, using rust nightly build. The full example can be seen in the [godbolt compiler explorer](https://rust.godbolt.org/z/aMMbrf8xq).

The original rust code is does an  addition of two `i32` vectors, the second operand should be masked using an `u8` bitmask.

```rust
#![feature(platform_intrinsics)]
#![feature(portable_simd)]
use std::simd::*;

extern "platform-intrinsic" {
    pub(crate) fn simd_select<M, T>(m: M, a: T, b: T) -> T;
    pub(crate) fn simd_select_bitmask<M, T>(m: M, a: T, b: T) -> T;
}

pub fn from_bitmask_i32x8(bitmask: u8) -> i32x8 {
    unsafe {
        simd_select_bitmask(bitmask, i32x8::splat(-1), i32x8::splat(0))
    }
}

pub fn add_masked_i32x8(a: i32x8, b: i32x8, bitmask: u8) -> i32x8 {
    unsafe {
        a + (b & from_bitmask_i32x8(bitmask))
    }
}
```

This compiles to the following llvm-ir (using `-C opt-level=3 --emit=llvm-ir`):

```llvm
define void @_ZN7example18from_bitmask_i32x817h2e814f0b6cf811aeE(<8 x i32>* noalias nocapture sret(<8 x i32>) dereferenceable(32) %0, i8 %bitmask) unnamed_addr #0 !dbg !6 {
  %1 = bitcast i8 %bitmask to <8 x i1>, !dbg !10
  %2 = sext <8 x i1> %1 to <8 x i32>, !dbg !10
  store <8 x i32> %2, <8 x i32>* %0, align 32, !dbg !10
  ret void, !dbg !11
}

define void @_ZN7example16add_masked_i32x817h5456e14a2952ad51E(<8 x i32>* noalias nocapture sret(<8 x i32>) dereferenceable(32) %0, <8 x i32>* noalias nocapture readonly dereferenceable(32) %a, <8 x i32>* noalias nocapture readonly dereferenceable(32) %b, i8 %bitmask) unnamed_addr #0 !dbg !12 {
  %_4 = load <8 x i32>, <8 x i32>* %a, align 32, !dbg !13
  %_6 = load <8 x i32>, <8 x i32>* %b, align 32, !dbg !14
  %1 = bitcast i8 %bitmask to <8 x i1>, !dbg !15
  %2 = select <8 x i1> %1, <8 x i32> %_6, <8 x i32> zeroinitializer, !dbg !17
  %3 = add <8 x i32> %2, %_4, !dbg !25
  store <8 x i32> %3, <8 x i32>* %0, align 32, !dbg !25, !alias.scope !29
  ret void, !dbg !32
}
```

With an avx512 capable target, the generated code looks good, the generated vector mask gets optimized away and replaced by a masked load using `k` registers.

```asm
example::add_masked_i32x8:
        mov     rax, rdi
        kmovd   k1, ecx
        vmovdqa32       ymm0 {k1} {z}, ymmword ptr [rdx]
        vpaddd  ymm0, ymm0, ymmword ptr [rsi]
        vmovdqa ymmword ptr [rdi], ymm0
        vzeroupper
        ret
```

With a non-avx512 target, generating a vector masked gets optimized nicely by broadcasting the bitmask and comparing it against a constant containing the lane indices.

```asm
.LCPI0_0:
        .long   1
        .long   2
        .long   4
        .long   8
        .long   16
        .long   32
        .long   64
        .long   128
example::from_bitmask_i32x8:
        mov     rax, rdi
        vmovd   xmm0, esi
        vpbroadcastb    ymm0, xmm0
        vmovdqa ymm1, ymmword ptr [rip + .LCPI0_0]
        vpand   ymm0, ymm0, ymm1
        vpcmpeqd        ymm0, ymm0, ymm1
        vmovdqa ymmword ptr [rdi], ymm0
        vzeroupper
        ret
```

The masked addition should then be able to just use the same code and blend using the generated vector mask. Instead it generates quite inefficient code that tests each bit in the bitmask individually and inserts values into the a vector register:

```asm

example::add_masked_i32x8:
        push    rax
        mov     rax, rdi
        mov     edi, ecx
        shr     dil, 5
        movzx   r9d, dil
        and     r9d, 1
        neg     r9d
        mov     r8d, ecx
        shr     r8b, 4
        movzx   edi, r8b
        and     edi, 1
        neg     edi
        vmovd   xmm0, edi
        vpinsrd xmm0, xmm0, r9d, 1
        mov     edi, ecx
        shr     dil, 6
        movzx   edi, dil
        and     edi, 1
        neg     edi
        vpinsrd xmm0, xmm0, edi, 2
        mov     edi, ecx
        shr     dil, 7
        movzx   edi, dil
        neg     edi
        vpinsrd xmm0, xmm0, edi, 3
        mov     edi, ecx
        and     edi, 1
        neg     edi
        vmovd   xmm1, edi
        mov     edi, ecx
        shr     dil
        movzx   edi, dil
        and     edi, 1
        neg     edi
        vpinsrd xmm1, xmm1, edi, 1
        mov     edi, ecx
        shr     dil, 2
        movzx   edi, dil
        and     edi, 1
        neg     edi
        vpinsrd xmm1, xmm1, edi, 2
        shr     cl, 3
        movzx   ecx, cl
        and     ecx, 1
        neg     ecx
        vpinsrd xmm1, xmm1, ecx, 3
        vinserti128     ymm0, ymm1, xmm0, 1
        vpand   ymm0, ymm0, ymmword ptr [rdx]
        vpaddd  ymm0, ymm0, ymmword ptr [rsi]
        vmovdqa ymmword ptr [rax], ymm0
        pop     rcx
        vzeroupper
        ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzFWUtv4zYQ_jXKhbAhUX4efMgmLbBAtyiKBQr0YlAiZXOXIrUk5Tj76ztDSY5tyU682baBowdnOJzHN8OhnRn-vPpItPEyF5z4rSCFUco8Sb0hueGCbIQWlnlpNJHO1YI8bYUm3j4jx5-18xGdO6LFE4mmHypjPcuUWDtZ8mj6GNHF1vvKRel9RH-Fz0b6bZ2Nc1PCi4XZI8X0Bp67maMwky6Jkpll9jmiD6R2uBZyEy03W6-eSVZLxcfkM-pbK0XEnpWVEiRnmmSCOAE6Sh3sAbU2hmdGeTCorKQSFtgrZaywQyriOuN2xthYVO47_LNPnzJbLPbfQLlxFD9G8X1zRR2MlRupmWqUDH6TjnAjHAGFCONcBg-agvgnQ6JZLFMKV7ITuTfWoZWoqxO50ZyYClwOd7c1teJoUMncV4hP4wkQCXPrBQrIpEfaiUYw3nxQm3aIphFNwBWFYL62AqyuFPOFseVaam-ldjIHNZbokYszTqJ7xFs7UN1zdGF6H4jhKaJw_XCsmdh7YUF7SrvVR4fVYZBE85adwF9VZ7BoDuATCIhCExS9dkKB06L04RN67XOU_gJcJSxHwgDDp8_4lLVPSzICJuR8q_B169X3LBLNH48th_VwkcKaspO-BgzsFyD3sNo9gaB2ogL11CO1dqwQp2P4N6T6i1jQslmpCQ96HqijBCM4TIsDafmyyIsxg1YBvtcNQg82BQ-1L62XXt7eay8DBH0gaCJcZted-rolXbacJjXkb1suHPHmrDIqtQPkWlShSUmYPnqAvIV6JnZCReljSkYjUUoA6mPLjUuAKun9YK4iUzPERSG1IDsjOYkm8frv3-dteUsWfVOT-ZaKRTIp4myWF4skYQKxCtBdkD06NWD3Hko8U5JBpTY5qzChibPC9zmXhAsrCvjXucBsBxYgwXhEp3EAzAIfX_wLYdKshNgDDNAlaQyXhGcbvM2OowfTEgIOQQTkDCrliSh0c6dMEnR5OBKUxEdSaJDioJyczGgWOBLT2jQsx0HlFaesQXbgP3dfZzw4caNJSi8JBZ-GyJ3Rk8H0uRLp2XlOQZynk-lMJBNGl1PK-DT5l-P8BtFWMG407MdXZLGfKCu7GX8JPQPgehKwowzjAzAZCDu7Evb0WPDsRsHZFcGTn5Iy04GUwU2inzQ9HRuL-sPfhTVSQzcDisPz2YLzowXTsCDE5FKKYSxO59PpK7mZ_kBugtDmPQBu7HLorsL48nrOgpw3bBR_QUOLDRnb7aeANQAzYpZ4ZjeYdU1b1_bQ0MCF1lAZ89WRjTG8z9A0hKHfg1HvcE-RJbiaE_bEngm2hVbAPo3tegbvXWsYcHfYjL5ib2jFRjpouNxwd8hc2XVloeY0PUBvL-_2q27zLc0u3C3bo_qWy1P6V2DgeA-gEvn-lLxD8jeW0vb9uSxjTFFgnz_iw3d0OMwEwpOxnFTeYg9v-f7Qbx5kVaAtb2S0U-KhqU72pzZq9FcJrJ2s0ymI_bqC3vx0HMvrawCBOqdHLUZesNGdrbCnP448hPMs9hqOZ3jsgY-FQGMlwFmIna4UIDKwY2EWKdITtmHQWMMdhuHONJ5MtIfBbiqcveCgojkIfw0j498e_vgYr-MeHMbKgDRCkuFhOjw8GR5eXJA9Gx5PL0ifXRCf0MUA4gfax1sxv2sxv28BKNw5Q3WIW9ahHhn3fZS9ADMZwrKsQvN7iMdATmhOBnMiOefMy0p84937G2b8V0mDZ-o2EQ5n5_Yw7PHrBzgRN2XWkC944sYDaDg_Qx_Q1Fh0AnDoriZerLJj8hGSA3oPTJmOw5FvtfSYG6IoZC6Fbk_1fss8ATokp2D5FpOv-56hy0PMp53kNVOqKdeQhMLChB1TNUiGA29znDikfFenLx0NDkn4I-W6qt22g-5NmO7oAghDhdxtbbhzqZA-7c3-vkfpy7DJIdPpEU43uGvpZ0DTYnOgDiu94FeVsovQXU2GlWpNQqZBpVr6BaXEq9nfY6gAA5Av--OsRwUGjb_R8bOrNl50_K02XjChFUPfZcL8NhN-VMn0NiXfj4ZkEA23OOc_Dm3Sei058tr70NkHxv9gAh1WMVfDqGg0zENZzC8p2JAvKdjrei8p2Ig502HXbBkS-pXwfrIjH-G6t6Ff3Pr_h06a7S82BZWpwt323PTmbuGOr1K-TJfszkuvxOrj-U5dwMb6-14mXR8RvlPHRgLaat1vx-9qq1ZXfqsIX881t1FlzRf8Cpr-Gn4RcfAwTeeg03Y1TxbFcibmaTJJZsuML2NasElO81k8nyQsvVMsE8qtwEERpfirSRABz-CsO7miMYVPkiTpdD5ZjOd0SfOELdNsxpJ4kUSTWJRMqjHqgT9O3NlVUCmrNw6ICvoI90JkzsGJWIiwHMhntd8au_oCFwfNitZ3YfVV0P4fqVFCVw">