<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/86102>86102</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Resource model of 16-bit SVE `udot` wrong for Neoverse-v1
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          recmo
      </td>
    </tr>
</table>

<pre>
    Consider 10 `udot` operations:

```
 udot z1.d, z0.h, z31.h
    udot z3.d, z2.h, z31.h
    udot z5.d, z4.h, z31.h
    udot z7.d, z6.h, z31.h
    udot z9.d, z8.h, z31.h
    udot z11.d, z10.h, z31.h
    udot z13.d, z12.h, z31.h
    udot z15.d, z14.h, z31.h
    udot z17.d, z16.h, z31.h
    udot z19.d, z18.h, z31.h
```

According to [Arm22] the 16-bit `udot` has a throughput of `1` and can only run on the 'FP/ASIMD 0' resource. From this I understand the block takes 10 cycles. In my benchmark on an AWS Graviton3 instance I also get 10 cycles, so this seems correct.

If I run this through analysis.

```
llvm-mca -mcpu=neoverse-v1 --noalias --all-views --bottleneck-analysis --all-stats
```

It tells me that this loop will take 5 cycles, which appears incorrect. It also tells me it will use both the `V1UnitV0`, `V1UnitV1`, which also seems incorrect.

```
[7]   - V1UnitV0
[8]   - V1UnitV1

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1] [4]    [5]    [6.0]  [6.1]  [7]    [8]    [9]    [10]   Instructions:
 - -      -      -      -      -      -      -      -      -      -      - 1.00    -      -     udot    z1.d, z0.h, z31.h
 -      -      -      - -      -      -      -      -      -      -     1.00    -      -      - udot    z3.d, z2.h, z31.h
 -      -      -      -      -      -      - -      -      -      -      -     1.00    -      -     udot        z5.d, z4.h, z31.h
 -      -      -      -      -      -      -      -      -      - -     1.00    -      -      -     udot        z7.d, z6.h, z31.h
 -      - -      -      -      -      -      -      -      -      -      -     1.00 -      -     udot   z9.d, z8.h, z31.h
 -      -      -      -      -      - -      -      -      -      -     1.00    -      -      - udot    z11.d, z10.h, z31.h
 -      -      -      -      -      -      - -      -      -      -      -     1.00    -      -     udot        z13.d, z12.h, z31.h
 -      -      -      -      -      -      -      -      -      - -     1.00    -      -      -     udot        z15.d, z14.h, z31.h
 -      - -      -      -      -      -      -      -      -      -      -     1.00 -      -     udot   z17.d, z16.h, z31.h
 -      -      -      -      - -      -      -      -      -      -     1.00    -      -      - udot    z19.d, z18.h, z31.h
```

* [Arm22] ARM Neoverse V1 Software Optimization Guide Revision r1p2 issue 6.0 (2022).

[Arm22]: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/

```
llvm-mca --version
Homebrew LLVM version 17.0.6
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8V9Fvo7gT_muclxHINgGShzyk21_2F2n37tTqes_GTIK3BiPbJEr_-pMJKWl3F223ukYRDPbHN59nJvZEOKf2DeKKpDckvZ2JzlfGrizK2swKU55Wn0zjVIkWGAWS0a40nmQUTItWeGUaR5I1obeEXq4ZHb79I4QX4InFJeGf4InGVX9PWFwNAIABkwwYPoFJB8x8ApMPmGwCsxwwiwkMu4hmU6rZRTab0s0uwtmUcnaRzqa0s4t49r36V9E_X9dSGluqZg_eAElv1rbmnKS34CsElkWF8teprYQDAb6ypttXbefB7MI0C3OiKUGKBkyjT2C7YPQshOebvwjfrO-3X2-BEp6DRWc6KzGGjTU1-Eo52ELXlGidDzzhvUIb-QhePKILBSZPUqOLYdtAfYICG1nVwj4GL6KB9T_38NmKg_KmSUA1gUYibEFoZ2CPfqQIYXHm7NQh1g6ksRalj68Ds93Btl9FjxtWDKIR-uSUiyfKWutDHdVSQFTLtiPJbYPmgNZhdGAQRY0RWgkHUSS0jg4Kj8EujPcaG5SP0cXHgHBeeDeRwa0Hj1o7qBF8JfxZsDamhaPSug8gpFdrP1ZKViDaFoV1oJrL6mHrz9F6plP-TNE5hML46pzOjD6wvxvlH3ol_NPVCBtGBheB7Bzh0ctE4Eh6k4fSA4jg2cVlZvFqhl0T3Q31BK1F5zqLUJz6IrCdDBvRuA-lNzSmPVVvsovJxlE2jvKz02AmIyA5A0h6Mx_n09HMRmg2cuUjYDGay9Fk57dgO-oeN1CIIIL-8-4biyn9brz_gdPl1Gb8E7q3Kvihe4ieFUxs9W9x9ZtCLiomDpN3J2A6DtcqJo6r347_z2PxQwkTp-F_mYergpg8az-wJCZP8w8sisl-4aPKYrIdeW8afrUw3trsEL5-0eOs777CH8PhDA8M7s3OH4VF-LP1qlZPfRMLnztVItzhQbnwaFnLQTnXIWQxBcIXnHJO-PLl0Ta6IckaKu_bfjPnG8I3JR5Qhy45FraOpanDmJFdjY3vfRK-ab-VRs6zLE9ZktBomS1SwjdaeHQ-sPxK_xGFlQW6fvz_psbC4hG-fHn4CsMUsDymcfaKaFauknKZLMUMVyxnjDKeZ_NZtZK7BIukWNKipOWi2KUlXQrMU5kvcJeX-UytOOVzmnBGl2zJ53G-ozLL51layoLRDMmcYi2UjoPM2Nj9rI_lapExymdaFKhd_4-D8waP50CTPo4zu-qXVnR7R-ZUK-fdyOKV17h6bgRqU6IO3enQw94__O-6jz1a0-xhZ-xz_qMDm3VWr15maq981RVDioKv4Ra11nxDGTLRK3SEb_oV_BsAAP__WetQlg">