[PATCH] D106653: [LoopVectorize][AArch64] Enable ordered reductions by default for SVE

Wed Jul 28 01:45:55 PDT 2021

sdesmalen added a comment.

In D106653#2909467 <https://reviews.llvm.org/D106653#2909467>, @dmgreen wrote:

> What do you mean by "lower risk"? Do you have performance numbers for SVE? Or is the cost so high that in practice they are never generated?

The end-goal is to enable strict reductions by default for all targets so that buildbots guard the functionality and hopefully give some performance benefit as well. The cost-model must be conservative enough to avoid any regressions, but still let through loops where there is an obvious benefit. At that point, we can start tuning the cost-model to let through more cases when that helps performance.

Our other motivation is around making LLVM 13 an experimental compiler for VLA auto-vectorization. We specifically want to enable strict reductions by default in LLVM 13 for vector-length-agnostic SVE, because this is a new vectorization capability which SVE can handle. The cost-model doesn't really matter too much at this point, because VLA auto-vec is experimental and little effort has yet been made to improve code-quality, so it's unlikely that strict-reductions will make a dent.

To achieve our end-goal of enabling strict reductions by default for all targets, we can do that in stages:

1. Enable it by default for VLA SVE (this patch)

We can enable it because performance doesn't really matter. It would mean this patch needs to be updated to (temporarily) give a high/invalid cost for ordered reductions when the type is a FixedVectorType, so that we don't accidentally introduce any regressions for e.g. `-mcpu=a64fx` when `-scalable-vectorization=on|preferred` is not specified.

2. Enable it by default for AArch64.

We have performed SPEC2K6 measurements where we can see that the cost-model holds up and where performance across the board is similar or better, with only very minor regressions (<1%). We want to do a bit more benchmarking (such as measuring on different AArch64 machines) to present numbers we're confident about.

3. Enable it by default for other targets.

This will require measurements on targets other than AArch64.

Does that sound like a sensible approach?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106653/new/

https://reviews.llvm.org/D106653