<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/130549>130549</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] `acc -= val < 0` converted to less efficient `acc += (val >> 7) & 1`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
The code:
```c
#include <immintrin.h>
#include <stdint.h>
__m256i count_negative_values(__m256i* src, size_t n) {
__m256i acc = _mm256_setzero_si256();
for (size_t i = 0; i < n; i++) {
__m256i c = _mm256_loadu_si256(src + i);
__m256i m = _mm256_cmpgt_epi8(_mm256_setzero_si256(), c);
acc = _mm256_sub_epi8(acc, m);
}
return acc;
}
```
with `-O3 -march=haswell` generates a sequence of `vpand` + `vpaddb` for adding to the accumulator, whereas it could produce just `vpsubb`. (if desired, in the unrolled case, it could, to add e.g. `a`,`b`,`c`,`d` to the accumulator without a long `vpsubb` chain, do `acc -= (a+b)+(c+d)`)
https://godbolt.org/z/Wo7qE8MjG
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VEuP6ygT_TWVTakjAvFr4UUn6XyrT7MZaWYXYajYtDDk8khr-tePcB739m1NZCmFi3NO1SmDjNGMjqiHagfVYSVzmnzo9ac0s1wNXv_T_zkRKq8JxCuwV6jZ7VFlwYVxymZNCGJv5tm4FIxbTyDevqVj0salZ469nk4zr2qDymeXTo5GmcyVTldpM0Xg7T0P_BVjUMD3GM0nnRI64B1CswP2ioj44JFKIYgDnuayPkVKnxT8KRpe1cBb4B2I3U26wM4-IPD2zmkWKAOxW8I9uiUEvluep96z6F-lrJc6P4ViUAh8Ad8V8f57gOdfwWq-jOlEF9OWnv-7dr5H9Y3x957z8GCSarFs_gKB5vDTgEApB1co7r7cko8B3zZ-mDQh1OzlD4EvswxqAnGYZPwga6FmOJKjIBNFlBjpRyanCP25QK4X6XTZU8y4rbUeyotivdTauBGTxzRRKSLP2crkQ6n6Y6JAMqJJ5euwGi_B66wI33NMN66Yh8K1LjM0Z9QUTSBdwMYtlNkFby1pVDLS8v5OVuLkSwFI63Fd6GTpl--hZsMzUs9oaeJ7oVi88TmhROvd-KUsVJM0rihpvwgohS9lUGUywHfDMtEd8FYB3-myKmrdzfQppUss540fgR9Hrwdv09qHEfjxE_jxL9_8eGv___6_le6F7kQnV9Rvmu2magVjzWrqpaT6vJW6rRqx3WyHrtKady3rzk2j2qZdmZ4zXjGxYaxigjdrTt2mYW1bMXHmlahhy2iWxq6tvc5Fe2VizNRvBKu23crKgWx8XBuhL7tehjxG2DJrYoo_cckku1wwf7c1VIcvdlylXU4bWzzz7kohkS5mW4oR6Xw2ypBLD1Dx7ObiDfkG4g2b5XzyGjdQs1UOtv_NQJOmPKyVn4EfS1n3v5dL8O-kEvDj0lsEfry3d-35vwEAAP__nbN7nQ">