<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62602>62602</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [MCA] Possibly incorrect scheduler simulation for Intel sunny cove uarchs(icelake-server, icelake-client, tigerlake, rocketlake, etc.)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          MetalOxideSemi
      </td>
    </tr>
</table>

<pre>
    [Compiler Explorer](https://godbolt.org/z/76c1c4n8z)
I've encountered an issue with the LLVM-MCA tool's simulation for the Intel Sunny Cove uarch CPUs, including `-mcpu=icelake-server`, `icelake-client`, `tigerlake`, and `rocketlake` models. The scheduler simulation seems to be incorrect in the following aspects:

1. ROB size: The actual ROB size for Sunny Cove is 352 [WikiChip](https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove), whereas LLVM-MCA reports a size of 224, which is the ROB capacity for Skylake.
2. Port 9: Sunny Cove has two data storing ports (4 and 9), compared to Skylake's single port. The reciprocal throughput for the `PUSH` instruction should be 0.5 (instead of Skylake's 1, [Instruction Tables](https://www.agner.org/optimize/instruction_tables.pdf)), while LLVM-MCA does not seem to correctly schedule to Port 9, resulting in a reported reciprocal throughput of 1.
Overall, the simulation results appear to be identical to those for Skylake, which is concerning. But the simulation behavior for Golden Cove (-mcpu=alderlake) is accurate. This leads me to suspect that there might be an issue with the Sunny Cove simulation.

```text
Iterations:        1
Instructions: 1
Total Cycles:      5
Total uOps:        3

Dispatch Width: 6
uOps Per Cycle:    0.60
IPC:               0.20
Block RThroughput: 1.0

No resource or data dependency bottlenecks discovered.

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 3 2     1.00           *            push  rax

Dynamic Dispatch Stall Cycles:
RAT     - Register unavailable:                      0
RCU - Retire tokens unavailable:                 0
SCHEDQ  - Scheduler full: 0
LQ      - Load queue full: 0
SQ      - Store queue full:                          0
GROUP   - Static restrictions on the dispatch group: 0
USH     - Uncategorised Structural Hazard:           0

Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
[# dispatched], [# cycles]
 0, 4  (80.0%)
 3,              1  (20.0%)

Schedulers - number of cycles where we saw N micro opcodes issued:
[# issued], [# cycles]
 0,          4  (80.0%)
 3,          1  (20.0%)

Scheduler's queue usage:
[1] Resource name.
[2] Average number of used buffer entries.
[3] Maximum number of used buffer entries.
[4] Total number of buffer entries.

 [1]            [2]        [3] [4]
ICXPortAny       0          1          60

Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
 0,           4  (80.0%)
 1,           1  (20.0%)

Total ROB Entries:                224
Max Used ROB Entries:             3  ( 1.3% )
Average Used ROB Entries per cy:  2  ( 0.9% )

Register File statistics:
Total number of mappings created:    1
Max number of mappings used:         1

Resources:
[0]   - ICXDivider
[1]   - ICXFPDivider
[2]   - ICXPort0
[3]   - ICXPort1
[4]   - ICXPort2
[5]   - ICXPort3
[6]   - ICXPort4
[7]   - ICXPort5
[8] - ICXPort6
[9]   - ICXPort7
[10]  - ICXPort8
[11]  - ICXPort9

Resource pressure per iteration:
[0]    [1]    [2]    [3] [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   
 -      - -      -      -      -     1.00    -     1.00   1.00    -      - 

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4] [5]    [6]    [7]    [8]    [9]    [10]   [11]   Instructions:
 -      - -      -      -      -     1.00    -     1.00   1.00    -      - push        rax

Timeline view:
Index     01234

[0,0] DeeER   push  rax

Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage

 [0]    [1]    [2]    [3]
0.     1     1.0    1.0    0.0       push rax
```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWF1T47jS_jXmpguXI8chueACAtmhCnZYAu_s3ZYid2K9yJKPJBMyv_6UZPkrZGaYU-uaGpxudas_HnW3RY3hO4l4GWXXUXZzRmtbKH35gJaKr-88xzWW_Gyj8oNbsVRlxQVquH2vhNKoo-wmIvPC2spE6VVEVhFZ7VS-UcLGSu8isvoekdXFjE3YVM6_R2QRJTdRcnUXkYs3BJRM1dKixhyoBG5MjbDntgBbINzf_9_D-cPyCqxSIiIXBgwva0EtVxK2SvtFd9KigHUt5QGW6g2hppoVsHx8MRFZApdM1DmXO4hmyXnJqjpKbzhDQV_x3KB-Qx3NErcymiUtnQmO0vZ0y3eoHSeQqMwdWSv2ijbQoVQ5ChPDc4FgWIF57QI1sNgglgasgg06s5TWyCxw6d3YKiHU3tlJTYXM-nD6UDX_T2J4-noNhn_HKL3ym1Bmayo6so_IIA7cQJoRiLLrb_yVLwtenUoWynjPXzkreBUS5n5GZMVdXCOyKjnTyoWUW2S21mgisjJum3-YekOXUbKEfYEaqelTprFS2hqgjW1qC4RMm5WcFc4457WzndGKMm4Pjf2vBxfPuHGaxPCotIWFc3ngWkEN2L2CnFoKxirt4tbsF5H51OdnESxjqqyow5dVrfYAJbkT6KWanGlkvNKKUQG20KreFVVtO5hFs-TxZf3FJZpLY3XNmpwWqha5y2gSZ25zx0SaO4eHu008krLru4HsM90INKeSst_vY7qTqENKVGV56TLv0tIp-Md6BXGVb72vIRFcDE5OrtCAVNaDz4UgwE4cOow6aogyWYJGUwvr4skl0JBGzH8QHbWFScjV1zfUVAinw8VrgPtGpQFaVUh1ewBylJZ7dQpsoQwO8z8CClOSoZZc7mK4ru2x-g0W9I0r7eX_UCJH2aAkIvP2wFORhwNMFk4lZazW1KLLPDcgkOYGSh8KU_vzB7agfiuNUPJdYZ3RH0vUAJS9SfHw4LqS4f9ZfLeh-FnUfqHLOIRnEnh9fj03kJ-VpQKWB-YQ0wplQ179tRqqS4c23HBTUcsK-MZzW7hVs4bhhOARdaM5iCfxLAnGPC4HKsOTxCSwr4Vir_D03OHBGxwnw63_VC7_qtYMQenmxOZYocxRsgNslLUCJbJXAzk3rqJozEcBHB6ZO7lVfWHMrifu9KRXEJHU-9_SSaDfU-u26ehpoA9tbnnTwHugh3tF846e9fS1VRo7xiwwvlCz5jnebreubjvYvXR9bmSpi14wLrym_eu0f83611l4PYJFoxZSIA124iQZZCgio5xVtSkANH0fQeIgackZdNBYWyoGCGtWPV09exXn8IQ7bixqqCV9o1y40vMRGy1EgvTyxUtart3RekVpfiUeJNfLL7c3f7l9110n3dZCOJGw5P4vCKa5bMF_aqzxeM26W-Mzd7Toh0-Q_uPp68tjkKaWMwdkq3mTBFBN387b-O20qqvB3i_rL2HvF8moxZ3S3GAOa5_HWlMBX-h3qvOxKcnJc3uvdpzBOci63KB2ZZf5RDV9F_YIhu7hT_DNGlTFVI6msw3z4ZmJSDrkuPbjW5OjN1odrQGYn3amDlHzeRInEck6ZEPqeKNn4heSo4UhFW0ezW-74UvuBxcC9dfmd8-n_PiUD76lN2CqDd3hcUmCp7bkSVq2w0x78q9co9zhIAi1w8Wm3m5RA0qrOZp4VLLggb7zsi4_K-NrSdMWeonTi5sYDApUV0T6QgWDYtUWyqY0L_92k8OVPLToHQWyfWYjUId6sFTSaiXgRXL7CUgM5h4D2qv4AImW_BuY-BEoJuNVP0NFE2c3yd42oT1RW9zs6xc_0Hd4can7yfrUbwaTOI2ImynDbi1ujsWhQg3s4LWQRjSJFyPRNvChhK_ciGhcTTOWs77aHyOmpFXF5c4A00gttpVq0rtyYq0D5tCjydiE5lyYYeqSBmbncLf8-4a_8Rz1cdv0vNXjMZcMuA6IyfjUDDiT8dkYcMio0Q856ajTDznTjnNxxMk6ztxxOvqsoy-OJC56X5tAdJx5z5mMOYtTMYVKozG1Rg8I3s6YJyINP59HfjmM9H5D62jzuhjoD3v11kM4WucQ-mL38vFPO9CMfo2JcA4_j8LmMCwavx-GwVj2r4bh9Cj370WlGfea52joe-YlCi4R3jjuu73vZI7vTfmekHR6NLi6eumduEG8ffrRONkWp2-UW7C8RD8Hb6irVWFSsqPNyeI4I65o3L4jq5vIHI_47Q5ODZgKpYU95f2XqvnYnD98DvwPOsIHtUaaf_yMGOlDQSvn7larEr5dQy0tF6EnuWq7w-OW-ykcNsuTeNBUJ3Ey-JPEbdf1mekT0354Nj_P8ss0X6QLeoaXk9k8zdLFLMnOissp2bDtfHFBtjTP05ywSZaRJGObZEE2bEbP-CVJSJpkyTxZZCnJYpLPJ_NNmhKWzaczNo2mCZaUi1iItzJWenfmh7PLGZkl5EzQDQrj7xcJkbhvJreIOD_P9KWTOd_UOxNNE8GNNb0Wy63wF5MPyysXlEdlDN-Iw-D27ORF21bpcC3o76mAddeCJiLzo8s_soSjaz-yhP7CjyxhcM1HloCWxRFZnNVaXB7de3Jb1JuYqTIiK-dE-HNeafX_yGxEVt51E5GVD81_AwAA___rTyeb">