[llvm] [AArch64] Prefer SVE for fixed-length [S|U][MIN|MAX] reductions (PR #181161)
Paul Walker via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 12 09:16:24 PST 2026
paulwalker-arm wrote:
> The throughput is about the same, but the SVE code is smaller than the NEON expansion.
Is this universally true? The latency of reduction instructions is typically linked to the vector length so I'm wondering if this might only be beneficial for SVE128 implementations? The latency for Neoverse V1 looks slightly worse? I wouldn't be surprised if this ends up being pretty bad for A64FX.
https://github.com/llvm/llvm-project/pull/181161
More information about the llvm-commits
mailing list