[llvm] [AArch64] Update zero latency instructions in Neoverse scheduling tables (PR #165690)
Ricardo Jesus via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 7 09:09:46 PST 2025
rj-jesus wrote:
Consider [this example](https://godbolt.org/z/hsMqeYh4n), which issues 20 `mov x9, 0` per iteration for 10^9 iterations. What I see:
```
$ perf stat -e cycles,instructions,op_retired ./a.out
Performance counter stats for './a.out':
3,503,570,565 cycles
22,003,024,649 instructions # 6.28 insn per cycle
22,003,034,412 op_retired
1.133565071 seconds time elapsed
1.050723000 seconds user
0.003995000 seconds sys
```
If you normalise the IPC perf computes by 21/22 (because the CMP+B is fused), you get exactly 6.
LLVM-MCA currently works out something similar: https://godbolt.org/z/7az3YEEY9.
If we make zero-latency moves zero micro-ops, then we'll have:
```
Iterations: 100000
Instructions: 2000000
Total Cycles: 6251
Total uOps: 0
Dispatch Width: 6
uOps Per Cycle: 0.00
IPC: 319.95
Block RThroughput: 0.0
```
(The IPC becomes bottlenecked on the MicroOpBufferSize.)
It does seem to me that these instructions should be modelled as one micro-op... Unless I'm missing something or unless there's a compelling reason for us to make this change, I believe it would be better if we left it as is. What do you think?
https://github.com/llvm/llvm-project/pull/165690
More information about the llvm-commits
mailing list