[llvm] [AArch64] Prefer SVE for fixed-length [S|U][MIN|MAX] reductions (PR #181161)

Thu Feb 12 09:16:24 PST 2026

paulwalker-arm wrote:

> The throughput is about the same, but the SVE code is smaller than the NEON expansion.

Is this universally true? The latency of reduction instructions is typically linked to the vector length so I'm wondering if this might only be beneficial for SVE128 implementations? The latency for Neoverse V1 looks slightly worse? I wouldn't be surprised if this ends up being pretty bad for A64FX.

https://github.com/llvm/llvm-project/pull/181161