[PATCH] D118979: [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth

Wed Feb 9 12:31:12 PST 2022

fhahn added a comment.

In D118979#3308289 <https://reviews.llvm.org/D118979#3308289>, @sdesmalen wrote:

> I'm missing a bit of rationale for this change. There is an interplay between having a wider VF or having a larger interleave factor. For 128bit vectors, an `add <4 x i64> %x, %y` will be legalized into two adds. Conceptually this is similar to vectorizing with `<2 x i64>` and having an interleave-factor of 2. I can imagine that interleaving in the loop-vectorizer leads to better code, because it avoids issues around type legalisation and may provide more opportunities for other IR passes to optimize the IR or move things around. If we always choose a wider VF I wonder if that may lead to poorer codegen because of type-legalization.
>
> Is there a specific example where it's clearly an improvement to have a wider VF? And would choosing a larger unroll-factor help those cases?

One case where choosing a wider VF can be beneficial are loops with memory operations on types with different width, where the memory operations on the narrow type are not legal for the VF based on the widest type. This reminded me of an oldish outstanding patch that focuses on exactly that case: D96522 <https://reviews.llvm.org/D96522>. Unless there are other cases where maximizing the VF is clearly beneficial, iterating on D96522 <https://reviews.llvm.org/D96522> might be an alternative.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118979/new/

https://reviews.llvm.org/D118979