[llvm] [AArch64][LoopVectorize] Enable tail-folding on neoverse-v2 (PR #135357)

Tue Apr 15 01:43:47 PDT 2025

david-arm wrote:

I can absolutely believe this is an overall win for SPEC2017, and for neoverse-v1 it made sense because of the 256-bit vector length which generally gave SVE an advantage anyway. However, like @davemgreen said we're now effectively forcing the compiler to use SVE on neoverse-v2 where it no longer has the vector length advantage. And like @davemgreen says the ideal situation is to have an unpredicated main vector body where you are free to interleave (since interleaving is very expensive with tail-folding). This is followed by a predicated vector epilogue to handle the remainder. In the right circumstances the vector tail will not even be a loop, but a single iteration.

https://github.com/llvm/llvm-project/pull/135357