<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/95860>95860</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
AARCH64: Non-SVE popcount autovect for 32bit and 64 bit could be improved using v8.4-a's udot instruction
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
pinskia
</td>
</tr>
</table>
<pre>
Take:
```
void f(int *a, int b)
{
for(int i = 0; i < b; i++)
a[i] = __builtin_popcount(a[i]);
}
```
Currently LLVM produces (for -O2 -march=armv8.4-a -fno-unroll-loops):
```
.LBB0_4: // =>This Inner Loop Header: Depth=1
ldr q0, [x11]
subs x10, x10, #4
cnt v0.16b, v0.16b
uaddlp v0.8h, v0.16b
uaddlp v0.4s, v0.8h
str q0, [x11], #16
b.ne .LBB0_4
```
But this could be improved to:
```
movi v1.16b, #1
.LBB0_4: // =>This Inner Loop Header: Depth=1
ldr q0, [x11]
subs x10, x10, #4
movi v2.4s, #0
cnt v0.16b, v0.16b
udot v2.4s, v0.16b, v1.16b
str q2, [x11], #16
b.ne .LBB0_4
```
That is generate :
```
movi v1.16b, #1
movi v2.4s, #0
cnt v0.16b, v0.16b
udot v2.4s, v0.16b, v1.16b
```
which is one extra instruction but will be pipelined better.
64bit still has the last `uaddlp`.
This came up during the review of the GCC patch and I thought it would be good to file this here too.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVU1v4zYQ_TX0ZWBBoj4iH3Sw7E13gbQF2mCvASWNLXZpUiWHSvbfF6TkNtms2x56qCBoRuAbcua9ISmck2eN2LCyZeVxIzyNxjaT1O6LFJvODF-bR_EFWb5n6ZGle1al6xt_ZyMHODFeS03A-F4wfoDgd4zv1oi7dnHgZOyKlMDyI6Qsb6N7gC66jLfxXSPhz0ewspWsPMawp6fOS0VSP01m6o3XxHh9RYTgvL2ufPxu0sv34K1FTeorPDx8_hEmawbfowPG65OxsP2Zw_YibD-y_CjsZa6TYisAtidttl5bo9RWGTO5uOL32Uke2jZ9Kli-B8bvGb8P-bP8w-MoHXzSGi08GDPBRxQD2gA74kRhwewtA2qw0f6eBn5Z2b5kWSj2Dcj5zgX7kkXUahjPi7e4XlO0c5pkVRcwq7egvBgGNcXhenw3uj6vQIVbQfX4TT5k3yW8JJRVb5FdojHYK1-3RWs9AQX2euPVAB2CvEzWzDgAmVsyxDUuZpYwZ9eSQxb_Z5FiugAzX_llPL9W8w_6XRUaDL2KfwXO3oOjVPy_lepxFATSwRk1WkEIt-S5qcxtDv6egn9V-jdJPI-yH0O6RiPgC1kBUjuyvidpNHSe4FkqFTpukhMqqTG0HxHaZJmhKjpJ4CigRuGARgQlHAGr0mW7sCpNrtyEFhYXBD_B4K3U54i3OEt8BnOKfz8cDjAJ6kcQeoBPQKPx55FAEjxf2_9sTGh9OEmFy84Y0SKQMclmaPJhl-_EBpvsLqurqijSfDM2YjfwihfdLqvrusbTLs_rIiuzvhRZ3xXlRjY85UVaZXecF1WRJ2Xfi1ScqgF5VXZ3HStSvAipEqXmS2LseSOd89jsyrpKN0p0qFy8UjjX-AxxkHEebhjbhJht58-OFamSjtxfs5Akhc1-_8vhYxV35E9Gb3_9_AGuJz0IT2bGnsJlAjkPlAdyqgKC-_5U8C5wuxzejN-5ZV-8knbjrWpGosmF_owHwFnS6LukNxfG70Nuq9lO1vyGPTF-HytyjN8vFc8N_yMAAP__uLL9hg">