<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/146564>146564</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Zen3 scheduler model for the latency of VEXTRACTF128rri is probably incorrect
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
TiborGY
</td>
</tr>
</table>
<pre>
See also discussion at https://discourse.llvm.org/t/are-the-latencies-of-vextractf128-correct-for-zen2-3-in-mca/86422
LLVM MCA relies on LLVM's scheduler models to predict cycle counts. This is the predicted timeline graph for a small snippet on Zen3:
```
[0,0] DeeeeeeeeER . . vmovapd (%rdi), %ymm0
[0,1] D=eeeeeeeeeeER . . vsubpd (%rsi), %ymm0, %ymm0
[0,2] D===========eeeER . vmulpd %ymm0, %ymm0, %ymm0
[0,3] D==============eeeeER vextractf128 $1, %ymm0, %xmm1
[0,4] D==============eE---R vmovhlps %xmm0, %xmm0, %xmm2
```
As you can see, `vextractf128` is predicted to have 4 cycles of latency. This however is inconsistent with both Agner Fogs latency tables (which list 3 cycles) and my own measurements with llvm-exegesis.
```
./llvm-exegesis -mode=latency -opcode-name=VEXTRACTF128rri -mcpu=znver3 --benchmark-repeat-count=100000 -min-instructions=1000 --repetition-mode=loop
---
mode: latency
key:
instructions:
- 'VEXTRACTF128rri XMM0 YMM0 i_0x1'
config: ''
register_initial_values:
- 'YMM0=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
min_instructions: 1000
measurements:
- { key: latency, value: 3.15, per_snippet_value: 3.15, validation_counters: {} }
error: ''
info: Repeating a single explicitly serial instruction
assembled_snippet: 4883EC20C7042400000000C744240400000000C744240800000000C744240C00000000C744241000000000C744241400000000C744241800000000C744241C00000000C5FE6F04244883C42049B80200000000000000662E0F1F840000000000C4E37D19C001C4E37D19C0014983C0FF75EEC3
...
```
Confusingly, AMD's official instruction latency table for Zen3 (Family_19h_Instruction_Latencies_version_1-00.xlsx, AMD Publication No. 56665 Revision 3.00 November 2020) lists `vextractf128` to have 4 cycles of latency. Perhaps I am misinterpreting my measurement results, but I cannot see how that figure could be correct. My confidence in the accuracy of the official latency table is further eroded by the fact that the two `vextractf128` variants are both listed with empty operand fields.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVk1v4zgS_TX0pUCBor7sgw-KHS8a6AwG2cZgZi8GRZUsblOkQFKOPb9-QdlOHHejsRgicMSv94rF4qsS3quDQVyT4okU24WYQm_d-ptqrPvXX4vGtuf1vxFBaG-hVV5O3itrQAToQxg9yWrCd4Tv4pydnMdE6-OQWHcgfBcI3wmHNPRItQhopEJPbUePeApOyNClfEmldQ5loJ119G80nGZUGTpIQfhuWeacE1YTVn_9-scLvGxqcKgVerAG4hDhlQcve2wnjQ4G26L2ECyMDlslA8iz1AjSTib4BL71yoPyEHq8rcAWghpQK4NwcGLsobMOBPhBaA3eqHHEEOn-gyaLB2Y1Kdn1j9WkeGKEbxgptrDFa3t-BQBIbj_HwR7F2MKlEb4kvHCtInxF-AYIL87DcIeVRqzYtiR7h5xBPxD91LwD3hD9I-LPwPk9-K__bgdJ4DhM-p7uB_ifMWX_P9M9ZeS8D5DIl6c_kp2GIf0gy_8J2TOl9PV6P70e_RX2juHukz9cfO3hbCeQwoBHnNeV7N5wUrIYandhZqEXR4T8EpQebAeXd3G-RmZv3_CILm5TRlrjlQ9oAryp0ENjQw_1waCDnT3421YIoolghC_feiV70MoHyK4chK9AmBaGM9g3AwMKPzkc0AR_QY3vleIJD-iVTy5v7f6YCeG7T2uAxkdGsu2Nn9pR2hapEUMc_uP5z2-v9ebbLuVL5xTQQY4TybZ_myO6DCht0Mh-EO47dTiiCHR-nCTbpiw2oIMyVBkf3CSDssZfpwDovCOoOPpuhbUjYTWllLD6MlbfXENY_R3PlzcL8BnyMgZAgfDq0eY_X14Y_BV_1J6dUsKruFpa06lDxIePp1BdJ8HhId6W2yujghJ6fxR6wkemCEqyLTuxyz45TvuL42q4eIiwOvp7H5waNd7YTstyX-Z0Mt-NfTNUKzOd6MFM8dDK7B_OBtFfceruum-GUCDVE1z88u4ovoHZ3DiWJWkRB0Z0-6v87X-YPAqtWhH59vP1oZt5SfVEqi2QaktYjc5Z98ldH_5SprNx6nUOAWUOUXGVOWgEPI1aSRX0GTw6JfT9zRFWC-9xaDS2N-siTr5cZs8bzjYVy3nOrm1T5bH32F8-9Def-yl76D_sTx_2px_7i91zuYsWRHs2OWf56mnJOPvUypI_s126W-YfY5v8Oau26WrDWHr_na-W2YbtdlXx_LyJsZEkyWMGYvXGmm6a3TdfZf2ynfOi7TolHxz4WTbmXBcTW5SPnRiUPu_TVb__8rFh__WWuvdHdDH971PKWHLS_nQlg9-nRis5hwP8ZhMoyrIs4BWPai4XsoQx-M0ecWjQAWecRVmKOuV_Jpq_1Mnf0fVi9PAFxACD8irG3uhwjqHhfK9w4NBPOvhoZTMF-BLF2tgQ9TpKLYReBOjUYXJzhaBbaOLHXI8k8HK-vPgWjURQZi4ahJSTE_IcbYr9dxd_dqvy0E0u9OgAnW2xheY8r--EDBfe2Atv9mcOOAqnRFRo4fAi-9FX2F4UG4cxnMGO6KKwdwp165NFu87aVbYSC1ynVZFmy7Lk2aJft2WXFUW7zNoOi7xi8V-6zLJVIbNKZulCrTnjBatYmpZpxbJEdk3KGrmSrGsL3q5IznAQSr-Xdgvl_YTrNC-LMl9o0aD2cwXJucE3mGcJj4XGwq3n3NFMB09yNt_4B0xQQeN6jr6HCm4Oy-igm1dtB48aPadW24hGn-dkOd_aYnJ6_bk0PajQT00i7XDNZLeENjr7X5SxSJ0t9oTvrkc6rvn_AgAA__9yJF9I">