[llvm-dev] [RFC] Changes to llvm.experimental.vector.reduce intrinsics
Simon Pilgrim via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 5 01:37:03 PDT 2019
On 04/04/2019 14:11, Sander De Smalen wrote:
> Proposed change:
>
> ----------------------------
>
> In this RFC I propose changing the intrinsics for
> llvm.experimental.vector.reduce.fadd and
> llvm.experimental.vector.reduce.fmul (see options A and B). I also
> propose renaming the 'accumulator' operand to 'start value' because
> for fmul this is the start value of the reduction, rather than a value
> to which the fmul reduction is accumulated into.
>
> [Option A] Always using the start value operand in the reduction
> (https://reviews.llvm.org/D60261)
>
> declare float
> @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float %start_value,
> <4 x float> %vec)
>
> This means that if the start value is 'undef', the result will be
> undef and all code creating such a reduction will need to ensure it
> has a sensible start value (e.g. 0.0 for fadd, 1.0 for fmul). When
> using 'fast' or ‘reassoc’ on the call it will be implemented using an
> unordered reduction, otherwise it will be implemented with an ordered
> reduction. Note that a new intrinsic is required to capture the new
> semantics. In this proposal the intrinsic is prefixed with a 'v2' for
> the time being, with the expectation this will be dropped when we
> remove 'experimental' from the reduction intrinsics in the future.
>
> [Option B] Having separate ordered and unordered intrinsics
> (https://reviews.llvm.org/D60262).
>
> declare float
> @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
> %start_value, <4 x float> %vec)
>
> declare float
> @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4 x float>
> %vec)
>
> This will mean that the behaviour is explicit from the intrinsic and
> the use of 'fast' or ‘reassoc’ on the call has no effect on how that
> intrinsic is lowered. The ordered reduction intrinsic will take a
> scalar start-value operand, where the unordered reduction intrinsic
> will only take a vector operand.
>
> Both options auto-upgrade the IR to use the new (version of the)
> intrinsics. I'm personally slightly in favour of [Option B], because
> it better aligns with the definition of the SelectionDAG nodes and is
> more explicit in its semantics. We also avoid having to use an
> artificial 'v2' like prefix to denote the new behaviour of the intrinsic.
>
Do we have any targets with instructions that can actually use the start
value? TBH I'd be tempted to suggest we just make the initial
extractelement/fadd/insertelement pattern a manual extra stage and avoid
having having that argument entirely.
> Further efforts:
>
> ----------------------------
>
> Here a non-exhaustive list of items I think work towards making the
> intrinsics non-experimental:
>
> * Adding SelectionDAG legalization for the _STRICT reduction
> SDNodes. After some great work from Nikita in D58015, unordered
> reductions are now legalized/expanded in SelectionDAG, so if we
> add expansion in SelectionDAG for strict reductions this would
> make the ExpandReductionsPass redundant.
> * Better enforcing the constraints of the intrinsics (see
> https://reviews.llvm.org/D60260 ).
> * I think we'll also want to be able to overload the result operand
> based on the vector element type for the intrinsics having the
> constraint that the result type must match the vector element
> type. e.g. dropping the redundant 'i32' in:
> i32 @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x i32> %a)
> => i32 @llvm.experimental.vector.reduce.and.v4i32(<4 x i32> %a)
>
> since i32 is implied by <4 x i32>. This would have the added benefit
> that LLVM would automatically check for the operands to match.
>
Won't this cause issues with overflow? Isn't the point of an add (or
mul....) reduction of say, <64 x i8> giving a larger (i32 or i64) result
so we don't lose anything? I agree for bitop reductions it doesn't make
sense though.
Simon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190405/fb6910a4/attachment.html>
More information about the llvm-dev
mailing list