[llvm] [AArch64] Initial sched model for Neoverse V3, V3AE (PR #163932)

Thu Oct 23 04:53:16 PDT 2025

https://github.com/Asher8118 commented:

Thanks for updating the patch! A few more comments on the SVE instructions:

- The SWOG shows the throughput of SVE PTRUES to be 2, but it [currently shows as 1](https://github.com/llvm/llvm-project/blob/73a42dda79f0d4e4ac4caa60b7738aeeb508994e/llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-sve-instructions.s#L5361).
- The V3 SWOG shows that SVE Dot product, 16 bit has execution latency of 3, which [currently shows as 4](https://github.com/llvm/llvm-project/blob/73a42dda79f0d4e4ac4caa60b7738aeeb508994e/llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-sve-instructions.s#L5475).
- Multiply accumulate B,H,S element size shows throughput of 2 in the SWOG, but [currently shows as 1](https://github.com/llvm/llvm-project/blob/73a42dda79f0d4e4ac4caa60b7738aeeb508994e/llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-sve-instructions.s#L4953).
- STNT1B, vector+scalar 32-bit element has execution throughput of 1/3 in the SWOG, [but shows as 1/2 currently](https://github.com/llvm/llvm-project/blob/73a42dda79f0d4e4ac4caa60b7738aeeb508994e/llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-sve-instructions.s#L6185C23-L6185C34). That's the instruction that caught my eye, but there might be more store instructions showing the wrong throughput. I saw there's differences here between the V2 and V3 SWOG (eg: scatter stores) which is what made me check this instruction. I'd double check those match the V3 SWOG.

https://github.com/llvm/llvm-project/pull/163932