[llvm-dev] [RFC] Changes to llvm.experimental.vector.reduce intrinsics
Sander De Smalen via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 4 09:07:28 PDT 2019
Hi David,
The reason for the asymmetry and requiring an explicit start-value operand is to be able to describe strict reductions that need to preserve the same associativity of a scalarized reduction.
For example:
%res = call float @llvm.experimental.vector.reduce.fadd(float %start, <4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)
describes the following reduction:
%res = (((%start + %elt0) + %elt1) + %elt2) + %elt3
Where:
%tmp = call float @llvm.experimental.vector.reduce.fadd(<4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)
%res = add float %start, %tmp
Describes:
%res = %start + (((%elt0 + %elt1) + %elt2) + %elt3)
Which is not the same, hence why the start operand is needed in the intrinsic itself. For fast-math (specifically the 'reassoc' property) the compiler is free to reassociate the expression, so the start/accumulator operand isn't needed.
Cheers,
Sander
On 04/04/2019, 16:44, "David Greene" <dag at cray.com> wrote:
Sander De Smalen via llvm-dev <llvm-dev at lists.llvm.org> writes:
> This means that for example:
>
> %res = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %v)
>
>
>
> does not result in %res being 'undef', but rather a reduction of <4 x
> float> %v. The definition of these intrinsics are different from their
> corresponding SelectionDAG nodes which explicitly split out a
> non-strict VECREDUCE_FADD that explicitly does not take a start-value
> operand, and a VECREDUCE_STRICT_FADD which does.
This seems very strange to me. What was the rationale for ignoring the
first argument? What was the rationale for the first argument existing
at all? Because that's how SVE reductions work? The asymmetry with
llvm.experimental.vector.reduce.add is odd.
>
> [Option B] Having separate ordered and unordered intrinsics (https://reviews.llvm.org/D60262).
>
>
>
> declare float @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float %start_value, <4 x float> %vec)
>
> declare float @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4 x float> %vec)
>
>
>
> This will mean that the behaviour is explicit from the intrinsic and
> the use of 'fast' or ‘reassoc’ on the call has no effect on how that
> intrinsic is lowered. The ordered reduction intrinsic will take a
> scalar start-value operand, where the unordered reduction intrinsic
> will only take a vector operand.
This seems by far the better solution. I'd much rather have things be
explicit in the IR than implicit via flags that might accidentally get
dropped.
Again, the asymmetry between these (one with a start value and one
without) seems strange and arbitrary. Why do we need start values at
all? Is it really difficult for isel to match s + vector.reduce(v)?
-David
More information about the llvm-dev
mailing list