[all-commits] [llvm/llvm-project] 15b0fa: [RISCV] Vectorize phi for loop carried @llvm.vecto...

Thu Jan 18 01:15:31 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 15b0fabb21af8395c1b810e7d992a869b9ef31d8
      https://github.com/llvm/llvm-project/commit/15b0fabb21af8395c1b810e7d992a869b9ef31d8
  Author: Luke Lau <luke at igalia.com>
  Date:   2024-01-18 (Thu, 18 Jan 2024)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp
    A llvm/test/CodeGen/RISCV/rvv/riscv-codegenprepare-asm.ll
    A llvm/test/CodeGen/RISCV/rvv/riscv-codegenprepare.ll

  Log Message:
  -----------
  [RISCV] Vectorize phi for loop carried @llvm.vector.reduce.fadd (#78244)

LLVM vector reduction intrinsics return a scalar result, but on RISC-V
vector reduction instructions write the result in the first element of a
vector register. So when a reduction in a loop uses a scalar phi, we end
up with unnecessary scalar moves:

loop:
    vfmv.s.f v10, fa0
    vfredosum.vs v8, v8, v10
    vfmv.f.s fa0, v8

This mainly affects ordered fadd reductions, which has a scalar accumulator
operand.
This tries to vectorize any scalar phis that feed into a fadd reduction
in RISCVCodeGenPrepare, converting:

loop:
%phi = phi <float> [ ..., %entry ], [ %acc, %loop]
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %phi, <vscale x 2 x float> %vec)
```

to

loop:
%phi = phi <vscale x 2 x float> [ ..., %entry ], [ %acc.vec, %loop]
%phi.scalar = extractelement <vscale x 2 x float> %phi, i64 0
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %x, <vscale x 2 x float> %vec)
%acc.vec = insertelement <vscale x 2 x float> poison, float %acc.next, i64 0

Which eliminates the scalar -> vector -> scalar crossing during
instruction selection.