[all-commits] [llvm/llvm-project] 487686: [SDAG][RISCV] Don't promote VP_REDUCE_{FADD, FMUL} ...

Thu Oct 3 09:18:06 PDT 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 487686b82e9a20bd090704e04c8cbf7207dd6527
      https://github.com/llvm/llvm-project/commit/487686b82e9a20bd090704e04c8cbf7207dd6527
  Author: Luke Lau <luke at igalia.com>
  Date:   2024-10-04 (Fri, 04 Oct 2024)

  Changed paths:
    M llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/RISCV/reduce-fadd.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp-vp.ll
    M llvm/test/CodeGen/RISCV/rvv/vreductions-fp-vp.ll

  Log Message:
  -----------
  [SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)

In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.

However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.

Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:

	; v9 = [0.1563, 5.97e-8, 0.00006104]

	; zvfh
	vsetivli x0, 3, e16, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v9, v8
	vfmv.f.s fa0, v8
	; fa0 = 0.1563

	; zvfhmin
	vsetivli x0, 3, e16, m1, ta, ma
	vfwcvt.f.f.v v10, v9
	vsetivli x0, 3, e32, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v10, v8
	vfmv.f.s fa0, v8
	fcvt.h.s fa0, fa0
	; fa0 = 0.1564

This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.

I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.

I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.

I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)

I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications