<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/128665>128665</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal boolean reduction vectorization
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          psiha
      </td>
    </tr>
</table>

<pre>
    Both Clang and GCC struggle, ARM and x86, but GCC does a better job overall (yet still suboptimal/worse than the handwritten version). If you make the accumulation variable (occurrences) 32 bit at least that helps them use horizontal adds/addv (and it again helps GCC more than Clang).

https://godbolt.org/z/bjT819fPE

[minbench.zip](https://github.com/user-attachments/files/18960269/minbench.zip)

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx8Ustu3DgQ_Brq0vBA6hm9DjqMx6vFHhYIkvwAHz0SHYockM1x7K8PKBsJkkNOBMnu6qrqkinZxRNNon0U7VMlM68hTrdkV1mpYF6nx8ArXJz0C0hv4N_LBRLHvCyOBF7g_Pn__f370JWryryXmEAJJChipgjPQUG4U5TOgcDhlRgSW-cgZRVubDfpBM4vISYCXqUHXglW6c1LtMzk4U4x2eAFjgf47wqvIcMmv9FeJ7XOW3aSbfBwl9FK5aiMCVrnGMlrSgJHOCIoyyAZHMnEZRDDSu6WCswGORGsIdq34Fk6kMYkgbM05l7AisbSvUjrP7qKzi3ED8q7RYWgqM-iPq_MtySOZ4GzwHkJRgXHhxAXgfObwFk9fx2a8frpn_dy0T5u1ivyej282ZtonwQOf2BYXrM66LAJnHOi-CCZpV438lyYXq0rQudmGLsau1Hg_BskjqI-V2Y6mvE4yoqmpj_Vpx77tqvWaej7fuwM4tCcml62pr42stPXY2OuJ9V1lZ2wxrZGbOvhODbdQZuTbgc9DF3fmfHYiFNNm7Tu4Nx9K0orm1KmqcGh69rKSUUu7TlD9PQC-69ALLGLU2l6UHlJ4lQ7mzj9gmHLjqYvP6MCKgRH0kMkk_X72klzWd0egipHN_3FuwL8cTzcYngmzQLnnc7u3zvf-4Q_AgAA___CkgTA">