[cfe-dev] RFC: Interface user provided vector functions with the vectorizer.

Mon Jun 17 01:02:58 PDT 2019

Hi Francesco,

On 6/11/19 10:55 PM, Francesco Petrogalli wrote:
> Dear all,
>
> I have re-written the proposal for interfacing user provided vector
> functions, originally posted in both llvm-dev and cfe-dev mailing
> list:
>
> "[RFC] Expose user provided vector function for auto-vectorization."
>
> The proposal looks quite different from the original submission,
> therefore I took the liberty to start a new thread.
>
> The original thread generated some good discussion. In particular,
> Simon Moll and Johannes Doerfert (CCed) have managed to provide good
> arguments for the following claims:
>
> 1. The Vector Function ABI name mangling scheme of a target is not
>     enough to describe all uses cases of function vectorization that
>     the compiler might end up needing to support in the future.
I think the new name of the attribute makes this point clear.
> 2. `declare variant` needs to be handled properly at IR level, to be
>     able to give the compiler the full OpenMP context of the directive.
>
> This proposal addresses those two concerns and other (I believe) minor
> concerns that have been raised in the previous thread.
>
> This proposal is provided with examples and a self assessment around
> extendibility.
>
> I have CCed all the people that have participated in the discussion so
> far, please let me know if you think I have missed anything of what
> have been raised.
>
> Kind regards,
>
> Francesco

LGTM. Please add me as a reviewer for this when you post patches.

Thanks!

Simon

>
> *** DRAFT OF THE PROPOSAL ***
>
> # SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer.
>
> Because the users care about portability (across compilers, libraries
> and systems), I believe we have to base sour solution on a standard
> that describes the mapping from the scalar function to the vector
> function.
>
> Because OpenMP is standard and widely used, we should base our
> solution on the mechanisms that the standard provides, via the
> directives `declare simd` and `declare variant`, the latter when used
> in with the `simd` trait in the `construct` set.
>
> Please notice that:
>
> 1. The scope of the proposal is not implementing full support for
>     `pragma omp declare variant`.
> 2. The scope of the proposal is not enabling the vectorizer to do new
>     kind of vectorizations (e.g. RV-like vectorization described by
>     Simon).
> 3. The proposal aims to be extendible wrt 1. and 2.
> 4. The IR attribute introduced in this proposal is equivalent to the
>     one needed for the VecClone pass under development in
>     https://reviews.llvm.org/D22792
>
> # CLANG COMPONENTS
>
> A C function attribute, `clang_declare_simd_variant`, to attach to the
> scalar version. The attribute provides enough information to the
> compiler about the vector shape of the user defined function. The
> vector shapes handled by the attribute are those handled by the OpenMP
> standard via `declare simd` (and no more than that).
>
> 1. The function attribute handling in clang is crafted with the
>     requirement that it will be possible to re-use the same components
>     for the info generated by `declare variant` when used with a `simd`
>     traits in the `construct` set.
> 2. The attribute allows orthogonality with the vectorization that is
>     done via OpenMP: the user vector function is still exposed for
>     vectorization when not using `-fopenmp-[simd]` once the `declare
>     simd` and `declare variant` directive of OpenMP will be available
>     in the front-end.
>
> ## C function attribute: `clang_declare_simd_variant`
>
> The definition of this attribute has been crafted to match the
> semantics of `declare variant` for a `simd` construct described in
> OpenMP 5.0. I have added only the traits of the `device` set, `isa`
> and `arch`, which I believe are enough to cover for the use case of
> this proposal. If that is not the case, please provide an example,
> extending the attribute will be easy even once the current one is
> implemented.
>
> ```
> clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, <context selector clauses>})
>
> <variant-func-id>:= The name of a function variant that is a base language identifier, or,
>                      for C++, a template-id.
>
> <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>}
>
> <simdlen> := simdlen(<positive number>) | simdlen("scalable")
>
> <mask>    := inbranch | notinbranch
>
> <optional simd clauses> := <linear clause>
>                           | <uniform clause>
>                           | <align clause>  | {,<optional simd clauses>}
>
> <linear clause>  := linear_ref(<var>,<step>)
>                    | linear_var(<var>, <step>)
>                    | linear_uval(<var>, <step>)
>                    | linear(<var>, <step>)
>
> <step> := <var> | <non zero number>
>
> <uniform clause> := uniform(<var>)
>
> <align clause>   := align(<var>, <positive number>)
>
> <var> := Name of a parameter in the scalar function declaration/definition
>
> <non zero number> := ... | -2 | -1 | 1 | 2 | ...
>
> <positive number> := 1 | 2 | 3 | ...
>
> <context selector clauses> := {<isa>}{,} {<arch>}
>
> <isa> := isa(target-specific-value)
>
> <arch> := arch(target-specific-value)
>
> ```
>
> # LLVM COMPONENTS:
>
> ## VectorFunctionShape class
>
> The object `VectorFunctionShape` contains the information about the
> kind of vectorization available for an `llvm::Call`.
>
> The object `VectorFunctionShape` must contain the following information:
>
> 1. Vectorization Factor (or number or concurrent lanes executed by the
>     SIMD version of the function). Encoded by unsigned integer.
> 2. Whether the vector function is requested for scalable
>     vectorization, encoded by a boolean.
> 3. Information about masking / no masking, encoded by a boolean.
> 4. Information about the parameters, encoded in a container that
>     carries objects of type `ParamaterType`, to describe features like
>     `linear` and `uniform`.
> 5. Function name redirection, if a user has specified to use a custom
>     name instead of the Vector Function ABI ones.
>
> Items 1. to 5. represents the information stored in the
> `vector-function-abi-variant` attribute (see next section).
>
> The object can be extended in the future to include new vectorization
> kinds (for example the RV-like vectorization of the Region
> Vectorizer), or to add more context information that might come from
> other uses of OpenMP `declare variant`, or to add new Vector Function
> ABIs not based on OpenMP. Such information can be retrieved by
> attributes that will be added to describe the `Call` instance.
>
> ## IR Attribute
>
> We define a `vector-function-abi-variant` attribute that lists the
> mangled names produced via the mangling function of the Vector
> Function ABI rules.
>
> ```
> vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..."
> ```
>
> 1. Because we use only OpenMP `declare simd` vectorization, and
>     because we require a vector Function ABI, we make this explicit
>     in the name of the attribute.
> 2. Because the Vector Function ABIs encode all the information
>     needed to know the vectorization shape of the vector function in
>     the mangled names, we provide the mangled name via the
>     attribute.
> 3. Function names redirection is specified by enclosing the name of
>     the redirection in parenthesis, as in
>     `abi_mangled_name_02(user_redirection)`.
>
> ## Vector ABI Demangler
>
> The “Vector ABI demangler”, is the component that demangles the data
> in the `vector-function-abi-variant` attribute and that provides the
> instances of the class `VectorFunctionShape` that can be derived by
> the mangled names listed in the attribute.
>
> ## Query interface: Search Vector Function System (SVFS)
>
> An interface that can be queried by the LLVM components to understand
> whether or not a scalar function can be vectorized, and that retrieves
> the vector function to be used if such vector shape is available.
>
> 1. This component is going to be unrelated to OpenMP.
> 2. This component will use internally the demangler defined in the
>     previous section, but it will not expose any aspect of the Vector
>     Function ABI via its interface.
>
> The interface provides two methods.
>
> ```
> std::vector<VectorFunctionShape> SVFS::isFunctionVectorizable(llvm::CallInst * Call);
>
> llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, VectorFunctionShape Info);
> ```
>
> The first method is used to list all the vector shapes that available
> and attached to a scalar function. An empty results means that no
> vector versions are available.
>
> The second method retrieves the information needed to build a call to
> a vector function with a specific `VectorFunctionShape` info.
>
> # (SELF) ASSESSMENT ON EXTENDIBILITY
>
>
> 1. Extending the C function attribute `clang_declare_simd_variant` to
>     new Vector Function ABIs that use OpenMP will be straightforward
>     because the attribute is tight to such ABIs and OpenMP.
> 2. The C attribute `clang_declare_simd_variant` and the `declare
>     variant` directive used for the `simd` trait will be sharing the
>     internals in clang, so adding the OpenMP functionality for `simd`
>     traits will be mostly handling the directive in the OpenMP
>     parser. How this should be done is described in
>     https://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-attribute
> 3. The IR attribute `vector-function-abi-variant` is not to be
>     extended to represent other kind of vectorization other than those
>     handled by `declare simd` and that are handled with a Vector
>     Function ABI.
> 4. The IR attribute `vector-function-abi-variant` is not defined to be
>     extended to represent the information of `declare variant` in its
>     totality.
> 5. The IR attribute will not need to change when we will introduce non
>     vector function ABI vectorization (RV-like, reductions...) or when
>     we will decide to fully support `declare variant`. The information
>     it carries will not need to be invalidated, but just extended with
>     new attributes that will need to be handled by the
>     `VectorFunctionShape` class, in a similar way the
>     `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
>     operates on individual attributes to describe an overall
>     functionality.
>
> # Examples
>
> ## Example 1
>
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64.
>
> ```
> double foo_01(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2), notinbranch, isa("simd"));
>
> // Advanced SIMD version
> float64x2_t vector_foo_01(float64x2_t VectorInput);
> ```
>
> The resulting IR attribute is:
>
> ```
> attribute #0 = {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}
> ```
>
> ## Example 2
>
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64, but with the wrong signature. The user specifies a masked
> version of the function in the clauses of the attribute, the compiler
> throws an error suggesting the signature expected for
> ``vector_foo_02.``
>
> ```
> double foo_02(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2), inbranch, isa("simd"));
>
> // Advanced SIMD version
> float64x2_t vector_foo_02(float64x2_t VectorInput);
> // (suggested) compiler error ->                      ^ Missing mask parameter of type `uint64x2_t`.
> ```
>
> ## Example 3
>
> Targeting `sincos`-like signatures.
>
> ```
> void foo_03(double Input, double * Output) __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), notinbranch, linear(Output, 1), isa("simd"));
>
> // Advanced SIMD version
> void vector_foo_03(float64x2_t VectorInput, double * Output);
> ```
>
> The resulting IR attribute is:
>
> ```
> attribute #0 = {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}
> ```
> ## Example 4
>
> Scalable vectorization targeting SVE
>
> ```
> double foo_04(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_04", simdlen("scalable"), notinbranch, isa("sve"));
>
> // SVE version
> svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);
> ```
>
> The resulting IR attribute is:
>
> ```
> attribute #0 = {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"}
> ```
>
> ## Example 5
>
> Fixed length vectorization targeting SVE
>
> ```
> double foo_05(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4), inbranch, isa("sve"));
>
> // Fixed-length SVE version
> svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);
> ```
>
> The resulting IR attribute is:
>
> ```
> attribute #0 = {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"}
> ```
>
> ## Example 06
>
> This is an x86 example, equivalent to the one provided by Andrei
> Elovikow in
> http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
> rendering with ICC at https://godbolt.org/z/Of1NxZ
>
> ```
> float MyAdd(float* a, int b) __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), notinbranch, arch("core_2nd_gen_avx"))
> {
>    return *a + b;
> }
>
>
> __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);
> ```
>
> The resulting IR attribute is:
>
> ```
> attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}
> ```
>
> ## Example showing interaction with `declare simd`
>
> ```
> #pragma omp declare simd linear(a) notinbranch
> float foo_06(float *a, int x) __attribute__(clang_declare_simd_variant(“vector_foo_06", simdlen(4), linear(a), notinbranch, arch("armv8.2-a+simd")) {
>      return *a + x;
> }
>
> // Advanced SIMD version
> float32x4_t vector_foo_06(float *a, int32x4_t vx) {
> // Custom implementation.
> }
> ```
>
> The resulting IR attribute is made of three symbols:
>
> 1. `_ZGVnN2l4v_foo_06` and `_ZGVnN4l4v_foo_06`, which represent the
>     ones the compiler builds by auto-vectorizing `foo_06` according to
>     the rule defined in the Vector Function ABI specifications for
>     AArch64.
> 2. `_ZGVnN4l4v_foo_06(vector_foo_06)`, which represents the
>     user-defined redirection of the 4-lane version of `foo_06` to the
>     custom implementation provided by the user when targeting Advanced
>     SIMD for version 8.2 of the A64 instruction set.
>
> ```
> attribute #0 = {vector-function-abi-variant="_ZGVnN2l4v_foo_06,_ZGVnN4l4v_foo_06,_ZGVnN4l4v_foo_06(vector_foo_06)"}
> ```
>
-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll