<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=http://email.email.llvm.org/c/eJzFWVlv2zgQ_jXKC2FDhw_5wQ9psgUCbIvFosAC-2JQIm2zoUiFpBwnv35nKMmXZCduum3g6OAMh3N8MxzamWYv8weitBM5Z8StOVlqKfWzUCuSa8bJiituqBNaEWFtxcnzmivizAty_F1ZF8RTSxR_JsH4U6mNo5nkCysKFozvgzhdO1faILkN4s_wWQm3rrJhrgt4MTB7IKlawXM7c-BnxjMiRWaoeQniO1JZXAu5iRKrtZMvJKuEZEPyDfWtpCR8S4tScpJTRTJOLAcdhfL2gForzTItHRhUlEJyA-yl1IabPhVxnWEzY6gNKvcK__TLl8ws0-0TKDcMwvsgvK2vqIM2YiUUlbWS3m_CEqa5JaAQoYwJ70G9JO5Zk2ASiiSGK9nw3Glj0UrU1fJcK0Z0CS6Hu13rSjI0qKD2EeJTewJEwtwqRQGZcEg70gjG6w9q0wzFSRBH4Iolp64yHKwuJXVLbYqFUM4IZUUOaszQI2dnHEX3gLeyoLpj6MLk1hP9UxDD9dOhZnzruAHt47hdfbBbHQZJMG3YCfyVVQaL5gA-joBYKoKiF5ZLcFqQ3H1Br30Lkj-Aq4DliB-g-PQNn7LmaUYGwISc7xW-aLz6kUWC6f2h5bAeLrI0umilLwAD2xTk7la7JRDUVpSnHnukUpYu-fEY_vWpvhcLWtYr1eFBzwN1EGEE-2mhJ832i-yN6bUK8L2oEbqzyXuoeWm8tH_7qL0UEPSJoIlwmVx26tuWtNlynNSQv025sMTpk8oo5QaQa1CFOiVh-uAO8hbqGd9wGST3CRkMeCEAqPcNNy4BqiS3vbmKTPUQ40uhONlowUgwChf_fp025S1Ku6ZG03XM02i0DLNJvkyjiHLEKkA3JVt0qsfuLZR4KgWFSq1zWmJCE2u463LOCOOGL-Ff5RyzHViABONBPA49YFJ83PsXwqRoAbEHGKBLkhAuEctWeJscRg-mRQQcggjIKVTKI1Ho5laZyOtydyAoCg-kxF6KhXJyNKNe4EBMY1O_HAuVlx-zetme_9R9rfHgxJUiSXxOKPjUR-6EHvWmz4VIT05zCuI8Ho0nPBrReDaOKRtH_3Oc3yHacMq0gv34giz6E2VlV-Mvik8AuBh57EhNWQ9MesJOL4Q9ORQ8uVJwdkHw6KekzLgnZXCT6CZNR8faou7wKzdaKOhmQHF4PllwerBg4heEmJxLMYzF8fx4_EZuJj-QmyC0fveAG9ocuis_PrucsyDnHRvFP9DQYkNGN9sxYA3AjJgljpoVZl3d1jU9NDRwvjWUWj9astKadRnqhtD3ezDqLO4pogBXM0Kf6QvBttBw2KexXc_gvW0NPe52m9Ej9oaGr4SFhsv2d4fUFm1X5mtO3QN09vJ2v2o330Jv_N3QLapvmDimPwIDw7sHFc-3x-QNkp9oEjfvL0URYooC-_QeH17R4TATCM_aMFI6gz28Ydtdv7mTVYK2rJbRTAn7plrRnVqr0V3Fs7ayjqcg9qsSevPjcSyvbwEE6pwaNBjZY6M9W2FPfxh5COdJ7BUcz_DYAx8DgcZKgLMQO20pQGRgx0INUoQjdEWhsYY7DMOdKjyZKAeD7VQ4e8FBRTEQ_hZGhn_e_fUQLsIOHIZSgzRCov7huH941D-cnpE96R9PzkifnBEfxWkP4nvax2sxv2kwv20AyO0pQ7mLW9aiHhm3XZTtgRn1YVmUvvndxaMnJxQjvTkRnXLmRcmfWPv-jhm_KmnwTN0kwu7s3ByGHX79ACfiusxq8h1P3HgA9edn6APqGotOAA7V1sSzVXZIHiA5oPfAlGk5LHmqhMPc4MulyAVXzanerakjQIfk5DRfY_K13zO0eYj5tBGsolLW5RqSkBuYsKGyAslw4K2PE7uUb-v0uaPBLgl_pFyXlV230L0K0y2dA6GvkNu18XcmJNLHndmvW5Q-85scMh0f4VSNu4Z-AjTFVztqv9Ipu6iUSX13NepXqjEJmXqVauhnlOJvZn-HoQQMQL5sD7MeFeg1_krHTy7aeNbx19p4xoRGTPwhE6bXmfCjSibXKflxNES9aLjGOb84tFHjtejAax9DZxcYv8GEuF_FXPajotYw92UxP6dgTT6nYKfrPadgLeZEh029ZQjoV_z70Y58gOvOhn526_8NnTTdnm0KSl36u-m46d3dwg2bJ2yWzOgNrdxam_l3uFjYgZW6qYycX_jZwX_TVt8GpdHf8dvk-LP_ccPCwziZgvj1nE6nbJomk4RNUjqJw_E0S8PleDJKZuPJNAxvJM24tHOwNYhj_AHEi4BnsPvm4xqIeRzG8ImiKBlPR-lwGs_iPKKzJJvQKEyjYBTyggo5RDn4S8WNmXuRWbWyQJTQVNg9kVoLx2POvcKgoRNO8vnDaYezhIbk61ZEbf_lf4vABgyOI6p7jLnxWs-9yv8BtodXSg>53760</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Inefficient code for Nxi1 masked operations on non-avx512 target
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jhorstmann
</td>
</tr>
</table>
<pre>
I noticed the following code generation issue when trying Rust's new [portable_simd](https://github.com/rust-lang/portable-simd) library, using rust nightly build. The full example can be seen in the [godbolt compiler explorer](https://rust.godbolt.org/z/aMMbrf8xq).
The original rust code is does an addition of two `i32` vectors, the second operand should be masked using an `u8` bitmask.
```rust
#![feature(platform_intrinsics)]
#![feature(portable_simd)]
use std::simd::*;
extern "platform-intrinsic" {
pub(crate) fn simd_select<M, T>(m: M, a: T, b: T) -> T;
pub(crate) fn simd_select_bitmask<M, T>(m: M, a: T, b: T) -> T;
}
pub fn from_bitmask_i32x8(bitmask: u8) -> i32x8 {
unsafe {
simd_select_bitmask(bitmask, i32x8::splat(-1), i32x8::splat(0))
}
}
pub fn add_masked_i32x8(a: i32x8, b: i32x8, bitmask: u8) -> i32x8 {
unsafe {
a + (b & from_bitmask_i32x8(bitmask))
}
}
```
This compiles to the following llvm-ir (using `-C opt-level=3 --emit=llvm-ir`):
```llvm
define void @_ZN7example18from_bitmask_i32x817h2e814f0b6cf811aeE(<8 x i32>* noalias nocapture sret(<8 x i32>) dereferenceable(32) %0, i8 %bitmask) unnamed_addr #0 !dbg !6 {
%1 = bitcast i8 %bitmask to <8 x i1>, !dbg !10
%2 = sext <8 x i1> %1 to <8 x i32>, !dbg !10
store <8 x i32> %2, <8 x i32>* %0, align 32, !dbg !10
ret void, !dbg !11
}
define void @_ZN7example16add_masked_i32x817h5456e14a2952ad51E(<8 x i32>* noalias nocapture sret(<8 x i32>) dereferenceable(32) %0, <8 x i32>* noalias nocapture readonly dereferenceable(32) %a, <8 x i32>* noalias nocapture readonly dereferenceable(32) %b, i8 %bitmask) unnamed_addr #0 !dbg !12 {
%_4 = load <8 x i32>, <8 x i32>* %a, align 32, !dbg !13
%_6 = load <8 x i32>, <8 x i32>* %b, align 32, !dbg !14
%1 = bitcast i8 %bitmask to <8 x i1>, !dbg !15
%2 = select <8 x i1> %1, <8 x i32> %_6, <8 x i32> zeroinitializer, !dbg !17
%3 = add <8 x i32> %2, %_4, !dbg !25
store <8 x i32> %3, <8 x i32>* %0, align 32, !dbg !25, !alias.scope !29
ret void, !dbg !32
}
```
With an avx512 capable target, the generated code looks good, the generated vector mask gets optimized away and replaced by a masked load using `k` registers.
```asm
example::add_masked_i32x8:
mov rax, rdi
kmovd k1, ecx
vmovdqa32 ymm0 {k1} {z}, ymmword ptr [rdx]
vpaddd ymm0, ymm0, ymmword ptr [rsi]
vmovdqa ymmword ptr [rdi], ymm0
vzeroupper
ret
```
With a non-avx512 target, generating a vector masked gets optimized nicely by broadcasting the bitmask and comparing it against a constant containing the lane indices.
```asm
.LCPI0_0:
.long 1
.long 2
.long 4
.long 8
.long 16
.long 32
.long 64
.long 128
example::from_bitmask_i32x8:
mov rax, rdi
vmovd xmm0, esi
vpbroadcastb ymm0, xmm0
vmovdqa ymm1, ymmword ptr [rip + .LCPI0_0]
vpand ymm0, ymm0, ymm1
vpcmpeqd ymm0, ymm0, ymm1
vmovdqa ymmword ptr [rdi], ymm0
vzeroupper
ret
```
The masked addition should then be able to just use the same code and blend using the generated vector mask. Instead it generates quite inefficient code that tests each bit in the bitmask individually and inserts values into the a vector register:
```asm
example::add_masked_i32x8:
push rax
mov rax, rdi
mov edi, ecx
shr dil, 5
movzx r9d, dil
and r9d, 1
neg r9d
mov r8d, ecx
shr r8b, 4
movzx edi, r8b
and edi, 1
neg edi
vmovd xmm0, edi
vpinsrd xmm0, xmm0, r9d, 1
mov edi, ecx
shr dil, 6
movzx edi, dil
and edi, 1
neg edi
vpinsrd xmm0, xmm0, edi, 2
mov edi, ecx
shr dil, 7
movzx edi, dil
neg edi
vpinsrd xmm0, xmm0, edi, 3
mov edi, ecx
and edi, 1
neg edi
vmovd xmm1, edi
mov edi, ecx
shr dil
movzx edi, dil
and edi, 1
neg edi
vpinsrd xmm1, xmm1, edi, 1
mov edi, ecx
shr dil, 2
movzx edi, dil
and edi, 1
neg edi
vpinsrd xmm1, xmm1, edi, 2
shr cl, 3
movzx ecx, cl
and ecx, 1
neg ecx
vpinsrd xmm1, xmm1, ecx, 3
vinserti128 ymm0, ymm1, xmm0, 1
vpand ymm0, ymm0, ymmword ptr [rdx]
vpaddd ymm0, ymm0, ymmword ptr [rsi]
vmovdqa ymmword ptr [rax], ymm0
pop rcx
vzeroupper
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzFWUtv4zYQ_jXKhbAhUX4efMgmLbBAtyiKBQr0YlAiZXOXIrUk5Tj76ztDSY5tyU682baBowdnOJzHN8OhnRn-vPpItPEyF5z4rSCFUco8Sb0hueGCbIQWlnlpNJHO1YI8bYUm3j4jx5-18xGdO6LFE4mmHypjPcuUWDtZ8mj6GNHF1vvKRel9RH-Fz0b6bZ2Nc1PCi4XZI8X0Bp67maMwky6Jkpll9jmiD6R2uBZyEy03W6-eSVZLxcfkM-pbK0XEnpWVEiRnmmSCOAE6Sh3sAbU2hmdGeTCorKQSFtgrZaywQyriOuN2xthYVO47_LNPnzJbLPbfQLlxFD9G8X1zRR2MlRupmWqUDH6TjnAjHAGFCONcBg-agvgnQ6JZLFMKV7ITuTfWoZWoqxO50ZyYClwOd7c1teJoUMncV4hP4wkQCXPrBQrIpEfaiUYw3nxQm3aIphFNwBWFYL62AqyuFPOFseVaam-ldjIHNZbokYszTqJ7xFs7UN1zdGF6H4jhKaJw_XCsmdh7YUF7SrvVR4fVYZBE85adwF9VZ7BoDuATCIhCExS9dkKB06L04RN67XOU_gJcJSxHwgDDp8_4lLVPSzICJuR8q_B169X3LBLNH48th_VwkcKaspO-BgzsFyD3sNo9gaB2ogL11CO1dqwQp2P4N6T6i1jQslmpCQ96HqijBCM4TIsDafmyyIsxg1YBvtcNQg82BQ-1L62XXt7eay8DBH0gaCJcZted-rolXbacJjXkb1suHPHmrDIqtQPkWlShSUmYPnqAvIV6JnZCReljSkYjUUoA6mPLjUuAKun9YK4iUzPERSG1IDsjOYkm8frv3-dteUsWfVOT-ZaKRTIp4myWF4skYQKxCtBdkD06NWD3Hko8U5JBpTY5qzChibPC9zmXhAsrCvjXucBsBxYgwXhEp3EAzAIfX_wLYdKshNgDDNAlaQyXhGcbvM2OowfTEgIOQQTkDCrliSh0c6dMEnR5OBKUxEdSaJDioJyczGgWOBLT2jQsx0HlFaesQXbgP3dfZzw4caNJSi8JBZ-GyJ3Rk8H0uRLp2XlOQZynk-lMJBNGl1PK-DT5l-P8BtFWMG407MdXZLGfKCu7GX8JPQPgehKwowzjAzAZCDu7Evb0WPDsRsHZFcGTn5Iy04GUwU2inzQ9HRuL-sPfhTVSQzcDisPz2YLzowXTsCDE5FKKYSxO59PpK7mZ_kBugtDmPQBu7HLorsL48nrOgpw3bBR_QUOLDRnb7aeANQAzYpZ4ZjeYdU1b1_bQ0MCF1lAZ89WRjTG8z9A0hKHfg1HvcE-RJbiaE_bEngm2hVbAPo3tegbvXWsYcHfYjL5ib2jFRjpouNxwd8hc2XVloeY0PUBvL-_2q27zLc0u3C3bo_qWy1P6V2DgeA-gEvn-lLxD8jeW0vb9uSxjTFFgnz_iw3d0OMwEwpOxnFTeYg9v-f7Qbx5kVaAtb2S0U-KhqU72pzZq9FcJrJ2s0ymI_bqC3vx0HMvrawCBOqdHLUZesNGdrbCnP448hPMs9hqOZ3jsgY-FQGMlwFmIna4UIDKwY2EWKdITtmHQWMMdhuHONJ5MtIfBbiqcveCgojkIfw0j498e_vgYr-MeHMbKgDRCkuFhOjw8GR5eXJA9Gx5PL0ifXRCf0MUA4gfax1sxv2sxv28BKNw5Q3WIW9ahHhn3fZS9ADMZwrKsQvN7iMdATmhOBnMiOefMy0p84937G2b8V0mDZ-o2EQ5n5_Yw7PHrBzgRN2XWkC944sYDaDg_Qx_Q1Fh0AnDoriZerLJj8hGSA3oPTJmOw5FvtfSYG6IoZC6Fbk_1fss8ATokp2D5FpOv-56hy0PMp53kNVOqKdeQhMLChB1TNUiGA29znDikfFenLx0NDkn4I-W6qt22g-5NmO7oAghDhdxtbbhzqZA-7c3-vkfpy7DJIdPpEU43uGvpZ0DTYnOgDiu94FeVsovQXU2GlWpNQqZBpVr6BaXEq9nfY6gAA5Av--OsRwUGjb_R8bOrNl50_K02XjChFUPfZcL8NhN-VMn0NiXfj4ZkEA23OOc_Dm3Sei058tr70NkHxv9gAh1WMVfDqGg0zENZzC8p2JAvKdjrei8p2Ig502HXbBkS-pXwfrIjH-G6t6Ff3Pr_h06a7S82BZWpwt323PTmbuGOr1K-TJfszkuvxOrj-U5dwMb6-14mXR8RvlPHRgLaat1vx-9qq1ZXfqsIX881t1FlzRf8Cpr-Gn4RcfAwTeeg03Y1TxbFcibmaTJJZsuML2NasElO81k8nyQsvVMsE8qtwEERpfirSRABz-CsO7miMYVPkiTpdD5ZjOd0SfOELdNsxpJ4kUSTWJRMqjHqgT9O3NlVUCmrNw6ICvoI90JkzsGJWIiwHMhntd8au_oCFwfNitZ3YfVV0P4fqVFCVw">