[llvm] [AArch64][SVE] Enable max vector bandwidth for SVE (PR #109671)

Tue Sep 24 02:20:38 PDT 2024

huntergr-arm wrote:

> My understanding is this is a pretty large change to make, changing the chosen vector factor for a lot of vectorized loops. We were quite careful when doing it for Neon to make sure the performance was OK overall, and it had decent theory behind it. We implemented a number of fixes and improvements to make sure that the performance for larger vector sizes was acceptable.
> 
> Do you have performance results for SVE? Is the main reason for dot vectorization? SVE has a different vectorization scheme in general where it relies more on top/bottom vectorization (which are not currently supported very much) and extending load / truncating stores.

I've run spec2017 on neoverse-v1 hardware so far, and I didn't see too much difference (besides parest failing to build, which I've fixed). Any suggestions for other benchmarks/platforms to check?

This is indeed initially targeted at enabling the dot product work, but the partial reduction intrinsics are intended to support other patterns as well in the future, including the top/bottom extending instructions.

Our other approach for dot products (if this proves to be too much of a change) would be to enable max vector bandwidth when an integer add reduction is present.

https://github.com/llvm/llvm-project/pull/109671