<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/128665>128665</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal boolean reduction vectorization
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
psiha
</td>
</tr>
</table>
<pre>
Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).
https://godbolt.org/z/bjT819fPE
[minbench.zip](https://github.com/user-attachments/files/18960269/minbench.zip)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx8Ustu3DgQ_Brq0vBA6hm9DjqMx6vFHhYIkvwAHz0SHYockM1x7K8PKBsJkkNOBMnu6qrqkinZxRNNon0U7VMlM68hTrdkV1mpYF6nx8ArXJz0C0hv4N_LBRLHvCyOBF7g_Pn__f370JWryryXmEAJJChipgjPQUG4U5TOgcDhlRgSW-cgZRVubDfpBM4vISYCXqUHXglW6c1LtMzk4U4x2eAFjgf47wqvIcMmv9FeJ7XOW3aSbfBwl9FK5aiMCVrnGMlrSgJHOCIoyyAZHMnEZRDDSu6WCswGORGsIdq34Fk6kMYkgbM05l7AisbSvUjrP7qKzi3ED8q7RYWgqM-iPq_MtySOZ4GzwHkJRgXHhxAXgfObwFk9fx2a8frpn_dy0T5u1ivyej282ZtonwQOf2BYXrM66LAJnHOi-CCZpV438lyYXq0rQudmGLsau1Hg_BskjqI-V2Y6mvE4yoqmpj_Vpx77tqvWaej7fuwM4tCcml62pr42stPXY2OuJ9V1lZ2wxrZGbOvhODbdQZuTbgc9DF3fmfHYiFNNm7Tu4Nx9K0orm1KmqcGh69rKSUUu7TlD9PQC-69ALLGLU2l6UHlJ4lQ7mzj9gmHLjqYvP6MCKgRH0kMkk_X72klzWd0egipHN_3FuwL8cTzcYngmzQLnnc7u3zvf-4Q_AgAA___CkgTA">