[all-commits] [llvm/llvm-project] 7f6bbb: [RISCV][TTI] Reduce cost of a build_vector pattern...

Fri Sep 20 08:34:58 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05
      https://github.com/llvm/llvm-project/commit/7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05
  Author: Philip Reames <preames at rivosinc.com>
  Date:   2024-09-20 (Fri, 20 Sep 2024)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
    M llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
    M llvm/test/Analysis/CostModel/RISCV/arith-fp.ll
    M llvm/test/Analysis/CostModel/RISCV/fixed-vector-gather.ll
    M llvm/test/Analysis/CostModel/RISCV/fp-min-max-abs.ll
    M llvm/test/Analysis/CostModel/RISCV/fp-sqrt-pow.ll
    M llvm/test/Analysis/CostModel/RISCV/fp-trig-log-exp.ll
    M llvm/test/Analysis/CostModel/RISCV/gep.ll
    M llvm/test/Analysis/CostModel/RISCV/int-sat-math.ll
    M llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
    M llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
    M llvm/test/Transforms/SLPVectorizer/RISCV/mixed-extracts-types.ll
    M llvm/test/Transforms/SLPVectorizer/RISCV/remarks-insert-into-small-vector.ll
    M llvm/test/Transforms/SLPVectorizer/RISCV/vec3-base.ll

  Log Message:
  -----------
  [RISCV][TTI] Reduce cost of a build_vector pattern (#108419)

This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet doesn't
do much good on it's own.

First, we can reduce the cost of a build_vector pattern. Our current
costing for this defers to generic insertelement costing which isn't
unreasonable, but also isn't correct. While inserting N elements
requires N-1 slides and N vmv.s.x, doing the full build_vector only
requires N vslide1down. (Note there are other cases that our build
vector lowering can do more cheaply, this is simply the easiest upper
bound which appears to be "good enough" for SLP costing purposes.)

Second, we need to tell SLP that calls don't preserve vector registers.
Without this, SLP will vectorize scalar code which performs e.g. 4 x
float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the
costing works out that this is in fact the optimal choice - except that
we don't actually have a <2 x float> @exp, and unroll during DAG. This
would be fine (or at least cost neutral) except that the libcall for the
scalar @exp blows all vector registers. So the net effect is we added a
bunch of spills that SLP had no idea about. Thankfully, AArch64 has a
similiar problem, and has taught SLP how to reason about spill cost once
the right TTI hook is implemented.

Now, for some implications...

The SLP solution for spill costing has some inaccuracies. In particular,
it basically just guesses whether a intrinsic will be lowered to a call
or not, and can be wrong in both directions. It also has no mechanism to
differentiate on calling convention.

This has the effect of making partial vectorization (i.e. starting in
scalar) more profitable. In practice, the major effect of this is to
make it more like SLP will vectorize part of a tree in an intersecting
forrest, and then vectorize the remaining tree once those uses have been
removed.

This has the effect of biasing us slightly away from strided, or indexed
loads during vectorization - because the scalar cost is more accurately
modeled, and these instructions look relevatively less profitable.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications