[cfe-dev] RFC: Add New Set of Vector Math Builtins

Mon Sep 27 15:54:09 PDT 2021

Hi Florian,

I have a few questions about thereduction builtins.

llvm.reduce.fadd is currently defined as ordered unless the reassociate
fast math flag is present. Are you proposing to change that to make it
pairwise?

llvm.reduce.fmin/fmax change behavior based on the nonans fast math flag.
And I think they always imply no signed zeros regardless of whether the
fast math flag is present. The vectorizers check the fast math flags before
creating the intrinsics today. What are the semantics of the proposed
builtin?

Thanks,
~Craig

On Mon, Sep 27, 2021 at 11:50 AM Florian Hahn via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi,
>
> I would like to provide a convenient way for our users to perform
> additional operations like min, max, abs and round on vector (and possibly
> matrix) types. To do so, I’d like to propose adding a new set of builtins
> that operate on vector-like types. The new builtins can be used to perform
> each operation element wise and as reduction on the elements of a vector or
> a matrix. The proposal also includes arithmetic reductions. Those are
> performed pairwise, rather then sequential to ensure they can lowered
> directly to vector instructions on targets like AArch64 and X86.
>
> I considered overloading the existing math builtins, but think we can
> provide a better and more consistent user-experience with a new set of
> builtins. The drawbacks of overloading math builtins is discussed at the
> end.
>
> Below is the proposed specification for element-wise and reductions
> builtins. They will be lowered to the corresponding LLVM intrinsic. Note
> that not all possible builtins are listed here. Once agreed on the general
> scheme, it should be easy to add additional builtins.
>
>
>
> *Specification of the element-wise builtins with 1 operands*
> Provided builtins:
> • __builtin_elementwise_abs
> • __builtin_elementwise_ceil
> • __builtin_elementwise_floor
> • __builtin_elementwise_rint
> • __builtin_elementwise_round
> • __builtin_elementwise_trunc
>
> ----
> T__builtin_elementwise_<name>(T x)
>
> T must be one of the following types:
> • an integer type (as in C2x 6.2.5p19), but excluding enumerated types
> and _Bool
> • the standard floating types float or double
> • a half-precision floating point type, if one is supported on the target
> • a vector or matrix type.
>
> For scalar types, consider the operation applied to a vector with a single
> element. For matrix types, consider them as a vector formed by
> concatenating its columns for the definitions below.
>
> Returns: A vector Res equivalent to applying fn elementwise to the input,
> where fn depends on (name, element type of VT):
>
> • (abs, floating point type) → return the absolute value of a
> floating-point number x
> • (abs, integer ty) → (a < 0) ? a * -1 : a
> • (ceil, floating point type) → return the smallest integral value greater
> than or equal to x
> • (ceil, integer type) → invalid
> • (floor, floating point type) → return the largest integral value less
> than or equal to x
> • (floor, integer type) → invalid
> • (rint, floating point type) → return the integral value nearest to x
> (according to the prevailing rounding mode) in floating-point format
> • (rint, integer type) → invalid
> • (round, floating point type) → return the integral value nearest to x
> rounding half-way cases away from zero, regardless of the current rounding
> direction
> • (round, integer type) → invalid
> • (trunc, floating point type) → return the integral value nearest to but
> no larger in magnitude than x
> • (trunc, integer type) → invalid
>
> Special values:
> Unless specified otherwise fn(±0)= ±0 and fn(±infinity) = ±infinity
>
> T Res
> for (int I = 0; I < NumElements; ++I)
>   Res[I] = fn(a[I])
> ----
>
>
>
> *Specification of the element-wise builtins with 2 operands:*
> Provided builtins:
> • __builtin_elementwise_max
> • __builtin_elementwise_min
>
> ----
> T __builtin_elementwise_<name>(T x, T y)
>
> T must be one of the following types:
> • an integer type (as in C2x 6.2.5p19), but excluding enumerated types
> and _Bool
> • the standard floating types float or double
> • a half-precision floating point type, if one is supported on the target
> • a vector or matrix type.
>
> For scalar types, consider the operation applied to a vector with a single
> element. For matrix types, consider them as a vector formed by
> concatenating its columns for the definitions below.
>
> Returns: A vector Res equivalent to applying fn(x, y) elementwise to the
> inputs, where fn depends on the name:
>
> • min → return x or y, whichever is smaller
> • max → return x or y, whichever is larger
>
> Special values:
> Unless otherwise specified, the following holds. If exactly one argument
> is a NaN, return the other argument. If both arguments are NaNs, fmax()
> return a NaN.
>
> VT Res
> for (int I = 0; I < NumElements; ++I)
>   Res[I] = fn(a[I], b[I])
> ----
>
>
>
> *Specification of reduction builtins:*
> Provided builtins:
> • __builtin_reduce_min
> • __builtin_reduce_max
> • __builtin_reduce_add
> • __builtin_reduce_and
> • __builtin_reduce_or
> • __builtin_reduce_xor
>
> ----
> ET__builtin_reduce_<name>(VT a)
>
> VT must be a vector or matrix type with element type ET. For matrix types,
> consider them as a vector formed by concatenating its columns for the
> definitions below.
>
> Returns: A scalar Res equivalent to applying fn as pairwise tree
> reduction to the input, where fn(x, y) depends on the name :
>
> • min → return x or y, whichever is smaller; If exactly one argument is a
> NaN, return the other argument. If both arguments are NaNs, fmax() return a
> NaN.
> • max → return x or y, whichever is larger; If exactly one argument is a
> NaN, return the other argument. If both arguments are NaNs, fmax() return a
> NaN.
> • add → +
> • and → & (integer (element) types only)
> • or → | (integer (element) types only)
> • xor → ^ (integer (element) types only)
> ----
>
>
> The intended semantics should allow for the following lowering using LLVM
> intrinsics (using __builtin_{elementwise,reduce}_min as example):
>
> declare float @llvm.vector.reduce.fmin.v4f32(<4 x float>)
> define float @float_min_red(<4 x float> %a) {
>   %r = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %a)
>   ret float %r
> }
>
> declare <4 x float> @llvm.minnum.v4f32(<4 x float>, <4 x float>)
> define <4 x float> @float_min(<4 x float> %a, <4 x float> %b) {
>   %r = call <4 x float> @llvm.minnum.v4f32(<4 x float> %a, <4 x float> %b)
>   ret <4 x float> %r
> }
>
> declare <4 x i32> @llvm.smin.v4i32(<4 x i32>, <4 x i32>)
> define <4 x i32> @int_min(<4 x i32> %a, <4 x i32> %b) {
>   %r = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %a, <4 x i32> %b)
>   ret <4 x i32> %r
> }
>
> declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32>)
> define i32 @int_min_red(<4 x i32> %b) {
>   %r = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %b)
>   ret i32 %r
> }
> https://llvm.godbolt.org/z/7Yv3v8aP9
>
>
>
> *Alternatives Considered*
> Instead of adding a set of completely new builtins, we could overload the
> existing libm-based builtins (__builtin_fminf& co). The main issue with
> that approach is convenience for the users I think. The libm-based builtins
> encode the type in their name which allows us to only cover a small subset
> of types. For example, AFAIK there’s no builtin for integer min/max or
> floating point max for 16 bit floating point types. The proposal above
> provides a set of more user friendly builtins that work across a large
> range of types, similar to the existing math operators (i.e. there’s a
> single __builtin_maxwhich works with both integer and floating points, as
> well as vector & matrix versions). The reduction versions are also
> explicitly marked in the name.
>
>
> Cheers
> Florian
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210927/340a3aa3/attachment.html>