[llvm-dev] [RFC] Changes to llvm.experimental.vector.reduce intrinsics

Thu Apr 4 13:03:21 PDT 2019

Sander De Smalen <Sander.DeSmalen at arm.com> writes:

> Hi David,
>
> The reason for the asymmetry and requiring an explicit start-value
> operand is to be able to describe strict reductions that need to
> preserve the same associativity of a scalarized reduction.
>
> For example:
>   %res = call float @llvm.experimental.vector.reduce.fadd(float %start, <4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)
>
> describes the following reduction:
>   %res = (((%start + %elt0) + %elt1) + %elt2) + %elt3
>
> Where:
>   %tmp = call float @llvm.experimental.vector.reduce.fadd(<4 x float> <float %elt0, float %elt1, float %elt2, float %elt3>)
>   %res = add float %start, %tmp
>
> Describes:
>   %res = %start + (((%elt0 + %elt1) + %elt2) + %elt3)
>
> Which is not the same, hence why the start operand is needed in the
> intrinsic itself. For fast-math (specifically the 'reassoc' property)
> the compiler is free to reassociate the expression, so the
> start/accumulator operand isn't needed.

Ok, I see.  I was assuming the scalar would just be folded into the
vector and then the vector would be reduced but that could be awkward,
necessitating use of insertelement/extractelement.

Thanks for explaining.

                     -David