<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/72803>72803</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Unnecessary roundtrip through avx512 vector registers for integer mask
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jhorstmann
      </td>
    </tr>
</table>

<pre>
    I have the following rust code, and was hoping that for an `avx512` target the autovectorizer could directly use the integer mask by moving it to a `k` register.

```rust
pub fn add_masked(a: [i32; 8], m: u64, b: [i32; 8]) -> [i32; 8] {
    let mut res = [0; 8];
 let mut bit = 1;
    let mut i = 0;
    while i < 8 {
        let x = if m & bit != 0 {
            b[i]
        } else {
 0
        };
        res[i] = a[i].wrapping_add(x);
        i += 1;
        bit <<= 1;
    }
    res
}
```

The generated code is not bad, but there is an unnecessary roundtrip through vector registers by broadcasting the mask and testing it against constants:

```asm
.LCPI0_0:
        .quad   1
        .quad 2
        .quad   4
        .quad   8
        .quad   16
 .quad   32
        .quad   64
        .quad 128
example::add_masked:
        mov     rax, rdi
 vpbroadcastq    zmm0, rdx
        vptestmq        k1, zmm0, zmmword ptr [rip + .LCPI0_0]
        vmovdqu ymm0, ymmword ptr [rsi]
        vpaddd ymm0 {k1}, ymm0, ymmword ptr [rcx]
        vmovdqu ymmword ptr [rdi], ymm0
        vzeroupper
        ret
```

[Compiler Explorer](https://rust.godbolt.org/z/5Pz1e35eq)

I tried to reproduce the issue also with `clang`, but somehow it does not vectorize the C code. Gcc manages to vectorize it, but the result is slightly worse than the rust version.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8VcuSozoPfhpno5qUMbnAIotOevLXVP2LWZyznjJYAU8bTNuGXJ7-lEySyW2aorqDLp9kWZ8kvddVi7hi8zWbv09kH2rrVr9r63xoZNtOCquOqx9QywEh1Ag7a4zd67YC1_sApVXIxAZkq2AvPdS2I12oZYCddSBbYAsuh8M8EWzBIUhXYYhIsg92wDJYp0_ooLS9UaC0wzKYI_R-jKfbgBU6aKT_gOIIjR0ogA4QLEgC_yBch5X2Ad2U8XfG385_F3x8KdVR1PUF7FqQSv0iRFRMZJKlb8Dma50Klq4hY_N3OlJD4n4xo9_FK5McvrH0-4MY2HI9hgIAMBig6QM49MDSd7LlV4D0YnixKnSIVskf1Q2Gjjp-p9vX2mDUbCC7D31xPUQ3vYMGmFiMMUQSoZ4d6CnoQJTfnYYt3wGNxxsf_mRxlxs9Dv0ZLWYhzx_TvZMdNcovqegGDkzkT74amFg_lyOmGAu1ie-jntK4flD8sRUu0mtP3DbKPzVChS06GVDFngbtobUBCqni_fexZ12Uyxb6tsUSvZfuCM72rQpOdxBqZ_uqhrGrrz3pqW8LZ6UqpQ8jPXBsaKJNwFGoA8hK6jayqvVBtsGz9O1lQ0vfjJLp_zc_f_Bf_Gp4KdH0s5cKAJJXYvHadvZanP0FeXGWXwTpX2AXL3ETccbFg2w6g3SC9O2GmI8nauww9pQ80JU4pc_6obsW95MMTk3DR4vDPcLQUa2bz8v3R0JmF_NT0-ytU9AFRzylC2ViDdcKPzJiaOygPns4nv2PD_7-mURDJ5VS0YN49JFQW46uLyHKw1dR70yVPo-tiHXvckJn-65D90jO8AUn2Hy9sU2nDTr4fuiMdehiiKwOoYudKbZMbGm2TiurCmvC1LqKie2Jie385ynBdI6fRO0b2B8QnEZF09th56zqy_Og975HkMZb2OtQ02QvjWwrSuvMQG8brO2emKIsjgS9bpAIsonkncL_yhIa2coKPUX6Y6TDDZ1pPPQmEKe90VVNe2dvXdw8sh0taMkN6Ly27XSiVqnK01xOcJUsOZ8lyzzJJvWKF8s0KRJe5CKTyaws50WZcp4n6Q4ll8uJXgku0iRJ8iRLsjSbLvNiMcsznu_UcqfmOZtxbKQ2U2OGhso4ifVYLUXG04mRBRofl7QQLe7HYjEhaGe7Ffl8K_rKsxk32gf_ByXoYHD175fjatzOz1OL9vft-p30zqzuL7_Soe6LaWkbJrYU9PzvW-fsbywDE9uYqmdiG4_yXwAAAP__b2JxJg">