<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/137700>137700</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AVX-512] `vpsubd a, b, vpmovm2d` can be done via a masked `vpsubd`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
This code, compiled via `-O3 -march=znver5`:
```c
#include<stdint.h>
#include<immintrin.h>
__m512i count_gt100(uint32_t* in, size_t count) {
__m512i neg_one = _mm512_set1_epi32(-1);
__m512i acc = _mm512_set1_epi32(0);
#pragma clang loop unroll(disable) // just to reduce noise
for (size_t i = 0; i < count; i++) {
__m512i val = _mm512_loadu_si512(in);
__mmask16 mask = _mm512_cmpgt_epi32_mask(val, _mm512_set1_epi32(100));
acc = _mm512_mask_sub_epi32(acc, mask, acc, neg_one);
in += 16;
}
return acc;
}
```
contains:
```asm
vpmovm2d zmm2, k0
vpsubd zmm0, zmm0, zmm2
```
whereas the implied desired code by the intrinsics, and as such gcc's codegen, takes just one instruction to do that - a masked `vpsubd`.
https://godbolt.org/z/MGjfGq96h
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VE2P4ygQ_TWVSykRhtixDz6kO5s5rfayWu3NwkBsuvnwGpzR5NevIO7pdDQTWcH4UcWrx6N4CHpwSrVQvkB52vAljn5u5Y1ryze9lz_av0cdUHipgL6i8HbSRkm8ao5Qke1fDLeWz2IEdrq5q5pLqAiwI5BjesmPSBPKtBNmkQrYa4hSu7gbgf3xDGlrtYuzdh9o19myoBqFX1zshlgQArRetIuMdhHoEbVLzIK-qS7elwFtEA4vQI6IiB8ZnBo67xQCO2Fn07cuqFh0atKMAq23BdAG2HMYF-K3IeRLBFA2zXywHIXhbkDj_YSLm70xQGupA--NytzoGegZ35YQMXqclVyEQud1UGuqi58RaL0WpTMBAuwlv76uVaYp0Jf8PBb8yP7KzSN747lcuqDLItFPyn3w7zpreXgvKkzDY4yw0xDvJXcJA1pfuUma_0qSfD7NZ94n-VKCLiz9z_VciJTqnvgV1-l6Vl_kTT_tMJXLTlhUKwKH07pgVnGZXU6RoTvy04ZAjsK7yLULTwblwT7scZ2sv1oq8WYtTWTeyRc0LL3EBJIEPoz0abfvo5oVDxhHhdpORiuJUgU9K5kvFPY_7lg2fNAiZAWcRB4wLGLEIalxuF-_QWWfR_6uwt06yczahTgvImrvkpWkxzjyiFvkWVIl0y29c4aK7BJDchxjnLIG2YeDl703cefnAej5BvT857e3y7f_mmrcyJbJhjV8o9risC_rqinJfjO27NJc6pKwinNVStH3vTiUtBeUElawut7olhJakj2ti2Zfkv2uai5F3-9JJYjcl5cS9kRZrs3OmKtNe290CItqC3Y4ELIxvFcm5LZEqVPfMaNAaepSc5uCtv0yBNgTo0MMn2mijib3s-M__26Tz8vTpwbIk4Z9-vs4ZqgICu6wVyiToqmz_VK8zTKb9kk6Hcel3wlvgZ4Tg3XYTrN_UyICPWfeAeh5Leza0v8DAAD__521nvk">