<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/86102>86102</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Resource model of 16-bit SVE `udot` wrong for Neoverse-v1
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
recmo
</td>
</tr>
</table>
<pre>
Consider 10 `udot` operations:
```
udot z1.d, z0.h, z31.h
udot z3.d, z2.h, z31.h
udot z5.d, z4.h, z31.h
udot z7.d, z6.h, z31.h
udot z9.d, z8.h, z31.h
udot z11.d, z10.h, z31.h
udot z13.d, z12.h, z31.h
udot z15.d, z14.h, z31.h
udot z17.d, z16.h, z31.h
udot z19.d, z18.h, z31.h
```
According to [Arm22] the 16-bit `udot` has a throughput of `1` and can only run on the 'FP/ASIMD 0' resource. From this I understand the block takes 10 cycles. In my benchmark on an AWS Graviton3 instance I also get 10 cycles, so this seems correct.
If I run this through analysis.
```
llvm-mca -mcpu=neoverse-v1 --noalias --all-views --bottleneck-analysis --all-stats
```
It tells me that this loop will take 5 cycles, which appears incorrect. It also tells me it will use both the `V1UnitV0`, `V1UnitV1`, which also seems incorrect.
```
[7] - V1UnitV0
[8] - V1UnitV1
Resource pressure by instruction:
[0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8] [9] [10] Instructions:
- - - - - - - - - - - - 1.00 - - udot z1.d, z0.h, z31.h
- - - - - - - - - - - 1.00 - - - udot z3.d, z2.h, z31.h
- - - - - - - - - - - - 1.00 - - udot z5.d, z4.h, z31.h
- - - - - - - - - - - 1.00 - - - udot z7.d, z6.h, z31.h
- - - - - - - - - - - - 1.00 - - udot z9.d, z8.h, z31.h
- - - - - - - - - - - 1.00 - - - udot z11.d, z10.h, z31.h
- - - - - - - - - - - - 1.00 - - udot z13.d, z12.h, z31.h
- - - - - - - - - - - 1.00 - - - udot z15.d, z14.h, z31.h
- - - - - - - - - - - - 1.00 - - udot z17.d, z16.h, z31.h
- - - - - - - - - - - 1.00 - - - udot z19.d, z18.h, z31.h
```
* [Arm22] ARM Neoverse V1 Software Optimization Guide Revision r1p2 issue 6.0 (2022).
[Arm22]: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
```
llvm-mca --version
Homebrew LLVM version 17.0.6
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8V9Fvo7gT_muclxHINgGShzyk21_2F2n37tTqes_GTIK3BiPbJEr_-pMJKWl3F223ukYRDPbHN59nJvZEOKf2DeKKpDckvZ2JzlfGrizK2swKU55Wn0zjVIkWGAWS0a40nmQUTItWeGUaR5I1obeEXq4ZHb79I4QX4InFJeGf4InGVX9PWFwNAIABkwwYPoFJB8x8ApMPmGwCsxwwiwkMu4hmU6rZRTab0s0uwtmUcnaRzqa0s4t49r36V9E_X9dSGluqZg_eAElv1rbmnKS34CsElkWF8teprYQDAb6ypttXbefB7MI0C3OiKUGKBkyjT2C7YPQshOebvwjfrO-3X2-BEp6DRWc6KzGGjTU1-Eo52ELXlGidDzzhvUIb-QhePKILBSZPUqOLYdtAfYICG1nVwj4GL6KB9T_38NmKg_KmSUA1gUYibEFoZ2CPfqQIYXHm7NQh1g6ksRalj68Ds93Btl9FjxtWDKIR-uSUiyfKWutDHdVSQFTLtiPJbYPmgNZhdGAQRY0RWgkHUSS0jg4Kj8EujPcaG5SP0cXHgHBeeDeRwa0Hj1o7qBF8JfxZsDamhaPSug8gpFdrP1ZKViDaFoV1oJrL6mHrz9F6plP-TNE5hML46pzOjD6wvxvlH3ol_NPVCBtGBheB7Bzh0ctE4Eh6k4fSA4jg2cVlZvFqhl0T3Q31BK1F5zqLUJz6IrCdDBvRuA-lNzSmPVVvsovJxlE2jvKz02AmIyA5A0h6Mx_n09HMRmg2cuUjYDGay9Fk57dgO-oeN1CIIIL-8-4biyn9brz_gdPl1Gb8E7q3Kvihe4ieFUxs9W9x9ZtCLiomDpN3J2A6DtcqJo6r347_z2PxQwkTp-F_mYergpg8az-wJCZP8w8sisl-4aPKYrIdeW8afrUw3trsEL5-0eOs777CH8PhDA8M7s3OH4VF-LP1qlZPfRMLnztVItzhQbnwaFnLQTnXIWQxBcIXnHJO-PLl0Ta6IckaKu_bfjPnG8I3JR5Qhy45FraOpanDmJFdjY3vfRK-ab-VRs6zLE9ZktBomS1SwjdaeHQ-sPxK_xGFlQW6fvz_psbC4hG-fHn4CsMUsDymcfaKaFauknKZLMUMVyxnjDKeZ_NZtZK7BIukWNKipOWi2KUlXQrMU5kvcJeX-UytOOVzmnBGl2zJ53G-ozLL51layoLRDMmcYi2UjoPM2Nj9rI_lapExymdaFKhd_4-D8waP50CTPo4zu-qXVnR7R-ZUK-fdyOKV17h6bgRqU6IO3enQw94__O-6jz1a0-xhZ-xz_qMDm3VWr15maq981RVDioKv4Ra11nxDGTLRK3SEb_oV_BsAAP__WetQlg">