[all-commits] [llvm/llvm-project] f0505c: [RISCV] Form vredsum from explode_vector + scalar ...

Sun Oct 1 17:42:21 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: f0505c3dbe50f533e55d21dfcb584fcac44bd80c
      https://github.com/llvm/llvm-project/commit/f0505c3dbe50f533e55d21dfcb584fcac44bd80c
  Author: Philip Reames <preames at rivosinc.com>
  Date:   2023-10-01 (Sun, 01 Oct 2023)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-formation.ll

  Log Message:
  -----------
  [RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821)

This change adds two related DAG combines which together will take a
left-reduce scalar add tree of an explode_vector, and will incrementally
form a vector reduction of the vector prefix. If the entire vector is
reduced, the result will be a reduction over the entire vector.

Profitability wise, this relies on vredsum being cheaper than a pair of
extracts and scalar add. Given vredsum is linear in LMUL, and the
vslidedown required for the extract is *also* linear in LMUL, this is
clearly true at higher index values. At N=2, it's a bit questionable,
but I think the vredsum form is probably a better canonical form
anyways.

Note that this only matches left reduces. This happens to be the
motivating example I have (from spec2017 x264). This approach could be
generalized to handle right reduces without much effort, and could be
generalized to handle any reduce whose tree starts with adjacent
elements if desired. The approach fails for a reduce such as (A+C)+(B+D)
because we can't find a root to start the reduce with without scanning
the entire associative add expression. We could maybe explore using
masked reduces for the root node, but that seems of questionable
profitability. (As in, worth questioning - I haven't explored in any
detail.)

This is covering up a deficiency in SLP. If SLP encounters the scalar
form of reduce_or(A) + reduce_sum(a) where a is some common
vectorizeable tree, SLP will sometimes fail to revisit one of the
reductions after vectorizing the other. Fixing this in SLP is hard, and
there's no good reason not to handle the easy cases in the backend.

Another option here would be to do this in VectorCombine or generic DAG.
I chose not to as the profitability of the non-legal typed prefix cases
is very target dependent. I think this makes sense as a starting point,
even if we move it elsewhere later.

This is currently restructed only to add reduces, but obviously makes
sense for any associative reduction operator. Once this is approved, I
plan to extend it in this manner. I'm simply staging work in case we
decide to go in another direction.