[llvm-dev] [RFC] Changes to llvm.experimental.vector.reduce intrinsics

Thu Apr 4 09:07:28 PDT 2019

Hi David,

The reason for the asymmetry and requiring an explicit start-value operand is to be able to describe strict reductions that need to preserve the same associativity of a scalarized reduction.

For example:
  %res = call float @llvm.experimental.vector.reduce.fadd(float %start, <4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)

describes the following reduction:
  %res = (((%start + %elt0) + %elt1) + %elt2) + %elt3

Where:
  %tmp = call float @llvm.experimental.vector.reduce.fadd(<4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)
  %res = add float %start, %tmp

Describes:
  %res = %start + (((%elt0 + %elt1) + %elt2) + %elt3)

Which is not the same, hence why the start operand is needed in the intrinsic itself. For fast-math (specifically the 'reassoc' property) the compiler is free to reassociate the expression, so the start/accumulator operand isn't needed.

Cheers,

Sander

On 04/04/2019, 16:44, "David Greene" <dag at cray.com> wrote:

    Sander De Smalen via llvm-dev <llvm-dev at lists.llvm.org> writes:

    > This means that for example:
    >
    >     %res = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %v)
    >
    >  
    >
    > does not result in %res being 'undef', but rather a reduction of <4 x
    > float> %v. The definition of these intrinsics are different from their
    > corresponding SelectionDAG nodes which explicitly split out a
    > non-strict VECREDUCE_FADD that explicitly does not take a start-value
    > operand, and a VECREDUCE_STRICT_FADD which does.

    This seems very strange to me.  What was the rationale for ignoring the
    first argument?  What was the rationale for the first argument existing
    at all?  Because that's how SVE reductions work?  The asymmetry with
    llvm.experimental.vector.reduce.add is odd.
    >
    > [Option B] Having separate ordered and unordered intrinsics (https://reviews.llvm.org/D60262).
    >
    >  
    >
    >   declare float @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float %start_value, <4 x float> %vec)
    >
    >   declare float @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4 x float> %vec)
    >
    >  
    >
    > This will mean that the behaviour is explicit from the intrinsic and
    > the use of 'fast' or ‘reassoc’ on the call has no effect on how that
    > intrinsic is lowered. The ordered reduction intrinsic will take a
    > scalar start-value operand, where the unordered reduction intrinsic
    > will only take a vector operand.

    This seems by far the better solution.  I'd much rather have things be
    explicit in the IR than implicit via flags that might accidentally get
    dropped.

    Again, the asymmetry between these (one with a start value and one
    without) seems strange and arbitrary.  Why do we need start values at
    all?  Is it really difficult for isel to match s + vector.reduce(v)?

                           -David