[llvm] [AArch64] Update zero latency instructions in Neoverse scheduling tables (PR #165690)

Fri Nov 7 09:09:46 PST 2025

rj-jesus wrote:

Consider [this example](https://godbolt.org/z/hsMqeYh4n), which issues 20 `mov x9, 0` per iteration for 10^9 iterations. What I see:
```
$ perf stat -e cycles,instructions,op_retired ./a.out 

 Performance counter stats for './a.out':

     3,503,570,565      cycles                                                                
    22,003,024,649      instructions                     #    6.28  insn per cycle            
    22,003,034,412      op_retired                                                            

       1.133565071 seconds time elapsed

       1.050723000 seconds user
       0.003995000 seconds sys
```
If you normalise the IPC perf computes by 21/22 (because the CMP+B is fused), you get exactly 6.
LLVM-MCA currently works out something similar: https://godbolt.org/z/7az3YEEY9.

If we make zero-latency moves zero micro-ops, then we'll have:
```
Iterations:        100000
Instructions:      2000000
Total Cycles:      6251
Total uOps:        0

Dispatch Width:    6
uOps Per Cycle:    0.00
IPC:               319.95
Block RThroughput: 0.0
```
(The IPC becomes bottlenecked on the MicroOpBufferSize.)

It does seem to me that these instructions should be modelled as one micro-op... Unless I'm missing something or unless there's a compelling reason for us to make this change, I believe it would be better if we left it as is. What do you think?

https://github.com/llvm/llvm-project/pull/165690