[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

Wed Nov 6 14:22:46 PST 2019

Dear all,

I have a WIP patch that require your attention, as I am in the process of replacing the TLI (TargetLibraryInfo) mappings with the IR attribute that we agreed to introduce here.

I’d appreciate if you could look at the patch at https://reviews.llvm.org/D67572

Thank you,

Francesco

> On Jun 28, 2019, at 3:16 PM, Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Dear all,
> 
> I have updated the proposal with the changes that are required to be
> able to generate the vector function signature in the front-end instead
> of the back-end.
> 
> I have updated the example, showcasing the use of the
> `llvm.compiler.used` intrinsics.
> 
> I have also mentioned that the `SVFS` should be wrapped in an analysis
> pass. I haven't proposed a brand new pass because I suspect that there
> is already one that could handle the information of the SVFS. Please
> point me at such pass if it exists.
> 
> I have also CCed Sumedh, who is working on the implementation of the
> SVFS described here.
> 
> Kind regards,
> 
> Francesco
> 
> *** DRAFT OF THE PROPOSAL ***
> 
> SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer.
> ================================================================================
> 
> Because the users care about portability (across compilers, libraries
> and systems), I believe we have to base sour solution on a standard that
> describes the mapping from the scalar function to the vector function.
> 
> Because OpenMP is standard and widely used, we should base our solution
> on the mechanisms that the standard provides, via the directives
> `declare simd` and `declare variant`, the latter when used in with the
> `simd` trait in the `construct` set.
> 
> Please notice that:
> 
> 1.  The scope of the proposal is not implementing full support for
>    `pragma omp declare variant`.
> 2.  The scope of the proposal is not enabling the vectorizer to do new
>    kind of vectorizations (e.g. RV-like vectorization described by
>    Simon).
> 3.  The proposal aims to be extendible wrt 1. and 2.
> 4.  The IR attribute introduced in this proposal is equivalent to the
>    one needed for the VecClone pass under development in
>    https://reviews.llvm.org/D22792
> 
> CLANG COMPONENTS
> ================
> 
> A C function attribute, `clang_declare_simd_variant`, to attach to the
> scalar version. The attribute provides enough information to the
> compiler about the vector shape of the user defined function. The vector
> shapes handled by the attribute are those handled by the OpenMP standard
> via `declare simd` (and no more than that).
> 
> 1.  The function attribute handling in clang is crafted with the
>    requirement that it will be possible to re-use the same components
>    for the info generated by `declare variant` when used with a `simd`
>    traits in the `construct` set.
> 2.  The attribute allows orthogonality with the vectorization that is
>    done via OpenMP: the user vector function is still exposed for
>    vectorization when not using `-fopenmp-[simd]` once the
>    `declare simd` and `declare variant` directive of OpenMP will be
>    available in the front-end.
> 
> C function attribute: `clang_declare_simd_variant`
> --------------------------------------------------
> 
> The definition of this attribute has been crafted to match the semantics
> of `declare variant` for a `simd` construct described in OpenMP 5.0. I
> have added only the traits of the `device` set, `isa` and `arch`, which
> I believe are enough to cover for the use case of this proposal. If that
> is not the case, please provide an example, extending the attribute will
> be easy even once the current one is implemented.
> 
>    clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, <context selector clauses>})
> 
>    <variant-func-id>:= The name of a function variant that is a base language identifier, or,
>                        for C++, a template-id.
> 
>    <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>}
> 
>    <simdlen> := simdlen(<positive number>) | simdlen("scalable")
> 
>    <mask>    := inbranch | notinbranch
> 
>    <optional simd clauses> := <linear clause> 
>                             | <uniform clause>
>                             | <align clause>  | {,<optional simd clauses>}
> 
>    <linear clause>  := linear_ref(<var>,<step>)
>                      | linear_var(<var>, <step>)
>                      | linear_uval(<var>, <step>)
>                      | linear(<var>, <step>)
> 
>    <step> := <var> | <non zero number>
> 
>    <uniform clause> := uniform(<var>)
> 
>    <align clause>   := align(<var>, <positive number>)
> 
>    <var> := Name of a parameter in the scalar function declaration/definition
> 
>    <non zero number> := ... | -2 | -1 | 1 | 2 | ...
> 
>    <positive number> := 1 | 2 | 3 | ...
> 
>    <context selector clauses> := {<isa>}{,} {<arch>}
> 
>    <isa> := isa(target-specific-value)
> 
>    <arch> := arch(target-specific-value)
> 
> LLVM COMPONENTS:
> ================
> 
> VectorFunctionShape class
> -------------------------
> 
> The object `VectorFunctionShape` contains the information about the kind
> of vectorization available for an `llvm::CallInst`.
> 
> The object `VectorFunctionShape` must contain the following information:
> 
> 1.  Vectorization Factor (or number or concurrent lanes executed by the
>    SIMD version of the function). Encoded by unsigned integer.
> 2.  Whether the vector function is requested for scalable vectorization,
>    encoded by a boolean.
> 3.  Information about masking / no masking, encoded by a boolean.
> 4.  Information about the parameters, encoded in a container that
>    carries objects of type `ParamaterType`, to describe features like
>    `linear` and `uniform`. This parameter type can be extended to
>    represent concepts that are not handled by OpenMP.
> 5.  Vector ISA, used in the implementation of the vector function.
> 
> The `VectorFunctionShape` class can be extended in the future to include
> new vectorization kinds (for example the RV-like vectorization of the
> Region Vectorizer), or to add more context information that might come
> from other uses of OpenMP `declare variant`, or to add new Vector
> Function ABIs not based on OpenMP. Such information can be retrieved by
> attributes that will be added to describe the `llvm::CallInst` instance.
> 
> IR Attribute
> ------------
> 
> We define a `vector-function-abi-variant` attribute that lists the
> mangled names produced via the mangling function of the Vector Function
> ABI rules.
> 
>    vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..."
> 
> 1.  Because we use only OpenMP `declare simd` vectorization, and because
>    we require a vector Function ABI, we make this explicit in the name
>    of the attribute.
> 2.  Because the Vector Function ABIs encode all the information needed
>    to know the vectorization shape of the vector function in the
>    mangled names, we provide the mangled name via the attribute.
> 3.  Function names redirection is specified by enclosing the name of the
>    redirection in parenthesis, as in
>    `abi_mangled_name_02(user_redirection)`.
> 
> The IR attribute is used in conjunction with the vector function
> declarations or definitions that are available in the module. Each
> mangled name in the `vector-function-abi-attribute` is be associated to
> a correspondent declaration/definition in the module. Such definition is
> provided by the front-end. The vector function declaration or definition
> is passed as an argument to the `llvm.compiler.used` intrinsic to
> prevent the compiler from removing it from the module (for example when
> the OpenMP mapping mechanism is used via C header file).
> 
> We decided to make the vector function signature explicit in IR by
> creating it with the front-end, because we have found some cases for
> which it is impossible to use the backend to reconstruct the vector
> function signature out of the Vector Function ABI mangled name and the
> signature of the scalar function. This is due to the fact that the
> layout of some C types is lost in the C-to-IR process.
> 
> As an example, the following three types can not be distinguished at IR
> level, because all cases are mapped to `i64` in the signature of the
> function `foo`. In fact, according to the rules of the Vector Function
> ABI for AArch64, the three types, for a 2-lane vectorization factor,
> will map respectively to `<4 x int>`, `<2 x (pointer_to_the_struct)>`,
> and `<2 x i64>`.
> 
>    // Type 1
>    typedef _Complex int S;
> 
>    // Type 2 
>    typedef struct x{
>    int a;
>    int b;
>    } S;
> 
>    // Type 3
>    typedef uint64_t S;
> 
>    S foo(S a, S b) {
>    return ...;
>    }
> 
> On of the problems that was raised during the discussion around these
> three types was how we could make sure that the vectorizer is able to
> determine how to map the values used in the scalar functions invocation
> to the values that can be used in the vecgtor signature.
> 
> I was thiking to store the information needed for this in the parameter
> attributes of the function, but I realised that just the size of the
> scalar parameter might be enough, therefore I don't think we need to add
> new attributes to handle this.
> 
> I illustrate my reasoning with an example, in which we want to vectorize
> the "flattened" signature `i64 foo(i64, i64)` to a 2-lane vector
> function. All exmaples are done for Advanced SIMD, with no mask
> parameter.
> 
> I won't discuss `Type 3` becasue the mapping from scalar parameters from
> vector parameters is trivial.
> 
> In case of `Type 1`, we will see a 2-lane vector function associated to
> `foo` with signature `<4 x i32>(<4 x i32>, <4 x i32>)` (the knowledge of
> being a 2-lane vector function comes from the `<vlen>` token in the
> magled name, which is always present even in case of a used defined
> custom name).
> 
> The size of the scalar parameter is 8, the size of the vector parameter
> is 16, therefore the fact that we are doing a 2-lane vectorization is
> enough to tell the vectorizer that the two instances of `i64` values
> needs to be mapped to the high and low half of the `<4 x i32>` type.
> 
> In case of `Type 2`, what we have is a situation in which two objects of
> type `i64` (the scalar values) need to be mapped to two pointers, which
> are pointing to instances of the same size of the scalar size. This is
> enough information for the vectorizer to be able to generate the code
> that can do this properly. The case of `Type 2` is distinguishable from
> `Type 3` because of the use of pointers, and it is distinguishable from
> the `linear` case that use references (in which vectors of pointers are
> needed) because the token in the mangled name is different (`v` is used
> for vector parameters that the vectorizer must to pass by value, while
> the linear references use different tokens for vector parameters.).
> 
> Query interface: Search Vector Function System (SVFS)
> -----------------------------------------------------
> 
> An interface that can be queried by the LLVM components to understand
> whether or not a scalar function can be vectorized, and that retrieves
> the vector function to be used if such vector shape is available.
> 
> 1.  This component is going to be unrelated to OpenMP.
> 2.  This component will use internally the IR attribute defined in the
>    previous section, but it will not expose any aspect of the Vector
>    Function ABI via its interface.
> 
> The interface provides two methods.
> 
>    std::vector<VectorFunctionShape> SVFS::isFunctionVectorizable(llvm::CallInst * Call);
> 
>    llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, VectorFunctionShape Info);
> 
> The first method is used to list all the vector shapes that available
> and attached to a scalar function. An empty results means that no vector
> versions are available.
> 
> The second method retrieves the information needed to build a call to a
> vector function with a specific `VectorFunctionShape` info.
> 
> The SVFS is wrapped in an analysis pass that can be retrieved in other
> passes.
> 
> (SELF) ASSESSMENT ON EXTENDIBILITY
> ==================================
> 
> 1.  Extending the C function attribute `clang_declare_simd_variant` to
>    new Vector Function ABIs that use OpenMP will be straightforward
>    because the attribute is tight to such ABIs and OpenMP.
> 2.  The C attribute `clang_declare_simd_variant` and the
>    `declare variant` directive used for the `simd` trait will be
>    sharing the internals in clang, so adding the OpenMP functionality
>    for `simd` traits will be mostly handling the directive in the
>    OpenMP parser. How this should be done is described in
>    https://clang.llvm.org/docs/InternalsManual.html\#how-to-add-an-attribute
> 3.  The IR attribute `vector-function-abi-variant` is not to be extended
>    to represent other kind of vectorization other than those handled by
>    `declare simd` and that are handled with a Vector Function ABI.
> 4.  The IR attribute `vector-function-abi-variant` is not defined to be
>    extended to represent the information of `declare variant` in its
>    totality.
> 5.  The IR attribute will not need to change when we will introduce non
>    vector function ABI vectorization (RV-like, reductions...) or when
>    we will decide to fully support `declare variant`. The information
>    it carries will not need to be invalidated, but just extended with
>    new attributes that will need to be handled by the
>    `VectorFunctionShape` class, in a similar way the
>    `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
>    operates on individual attributes to describe an overall
>    functionality.
> 6.  The IR attribute is to be used also to provide vector function
>    information via the `declare simd` directive of OpenMP (see Example
>    7 below).
> 
> Examples
> ========
> 
> Example 1
> ---------
> 
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64.
> 
>    double foo_01(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2), notinbranch, isa("simd"));
> 
>    // Advanced SIMD version provided by the user via an external module
>    float64x2_t vector_foo_01(float64x2_t VectorInput);
> 
>    // ... loop ...
>       x[i] = foo_01(y[i])
> 
> The resulting IR is:
> 
>    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x double> (<2 x double>)* @_ZGVnN2v_foo_01 to i8*)], section "llvm.metadata"
> 
>    declare double @foo_01(double %in) #0
> 
>    declare <2 x double> @_ZGVnN2v_foo_01(<2 x double>)
> 
>    // ... loop ...
>       %xi = call double @foo_01(double %yi) #0
> 
>    attribute #0 = {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}
> 
> Example 2
> ---------
> 
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64, but with the wrong signature. The user specifies a masked
> version of the function in the clauses of the attribute, the compiler
> throws an error suggesting the signature expected for `vector_foo_02.`
> 
>    double foo_02(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2), inbranch, isa("simd"));
> 
>    // Advanced SIMD version
>    float64x2_t vector_foo_02(float64x2_t VectorInput); 
>    // (suggested) compiler error ->                      ^ Missing mask parameter of type `uint64x2_t`.
> 
> Example 3
> ---------
> 
> Targeting `sincos`-like signatures.
> 
>    void foo_03(double Input, double * Output) __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), notinbranch, linear(Output, 1), isa("simd"));
> 
>    // Advanced SIMD version
>    void vector_foo_03(float64x2_t VectorInput, double * Output); 
> 
>    // ... loop ...
>       foo_03(x[i], y + i)
> 
> The resulting IR is:
> 
>    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void (<2 x double>, double *)* @_ZGVnN2vl8_foo_03 to i8*)], section "llvm.metadata"
> 
>    declare void @foo_03(double, double *) #0
> 
>    declare void @_ZGVnN2vl8_foo_03(<2 x double>, double *)
> 
>    ;; ... loop ...
>    call void @foo_03(double %xi, double * %yiptr) #0
> 
>    attribute #0 = {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}
> 
> Example 4
> ---------
> 
> Scalable vectorization targeting SVE
> 
>    double foo_04(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_04", simdlen("scalable"), notinbranch, isa("sve"));
> 
>    // SVE version
>    svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);
> 
>    // ... loop ...
>       x[i] = foo_04(y[i])
> 
> The IR generated is:
> 
>    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<vscale 2 x double> (<vscale 2 x double>)* @_ZGVsMxv_foo_04 to i8*)], section "llvm.metadata"
> 
>    declare double @foo_04(double %in) #0
> 
>    declare <vscale 2 x double> @_ZGVnNxv_foo_04(<vscale 2 x double>)
> 
>    // ... loop ...
>       %xi = call double @foo_04(double %yi) #0
> 
>    attribute #0 = {vector-abi-variant="_ZGVsMxv_foo_04(vector_foo_04)"}
> 
> Example 5
> ---------
> 
> Fixed length vectorization targeting SVE
> 
>    double foo_05(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4), inbranch, isa("sve"));
> 
>    // Fixed-length SVE version
>    svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);
> 
> The resulting IR is:
> 
>    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<4 x double> (<4 x double>)* @_ZGVsM4v_foo_05 to i8*)], section "llvm.metadata"
> 
>    declare double @foo_05(double %in) #0
> 
>    declare <4 x double> @_ZGVnNxv_foo_05(<4 x double>)
> 
>    ;; ... loop ...
>       %xi = call double @foo_05(double %yi) #0
> 
>    attribute #0 = {vector-abi-variant="_ZGVsM4v_foo_04(vector_foo_04)"}
> 
> Example 6
> ---------
> 
> This is an x86 example, equivalent to the one provided by Andrei
> Elovikow in
> http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
> rendering with ICC at https://godbolt.org/z/Of1NxZ
> 
>    float MyAdd(float* a, int b) __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), notinbranch, linear(a), arch("core_2nd_gen_avx"))
>    { 
>      return *a + b;
>    }
> 
> 
>    __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);
> 
>    // ... loop ...
> 
>      x[i] = MyAdd(a+i, b[i]);
> 
> The resulting IR is:
> 
>    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<8 x float> (float *, <2 x i64>, <2 x i64>)* @_ZGVbN8l4v_MyAdd to i8*)], section "llvm.metadata"
> 
>    define float @MyAdd(float %a, i32 %b) {
>      ;; return *a + b :)
>    }
> 
>    define <8 x float> @_ZGVbN8l4v_MyAdd(float *, <2 x i64>, <2 x i64>)
> 
>    ;; ... loop ...
>       %xi = call float @MyAdd(float * %aiptr, i32 ) #0
> 
>    attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}
> 
> Note: the signature of `MyAddVec` uses `<2 x i64>` instead of
> `<4 x i32>`, as shown in https://godbolt.org/z/T4T8s3 (line 11). If we
> would have asked the back end to generate the signature of `MyAddVec` by
> looking at the signature of the scalar function and the `<vlen>=8` token
> in the mangled name in the attribute, we would have end up using
> `<8 x i32>` instead of two instanced of `<2 x i64>`, which would have
> been wrong.
> 
> This is another example that demonstrate that we need to generate the
> vector function signatures in the front-end and not in the backend.
> 
> Example 7: showing interaction with `declare simd`
> --------------------------------------------------
> 
>    #pragma omp declare simd linear(a) notinbranch
>    float foo_07(float *a, int x) __attribute__(clang_declare_simd_variant(“vector_foo_07", simdlen(4), linear(a), notinbranch, arch("armv8.2-a+simd")) {
>        return *a + x;
>    }
> 
>    // Advanced SIMD version
>    float32x4_t vector_foo_07(float *a, int32x4_t vx) {
>    // Custom implementation.
>    }
> 
>    // ... loop ...
> 
>      x[i] = foo_07(a+i, b[i]);
> 
> The resulting IR attribute is made of three symbols:
> 
> 1.  `_ZGVnN2l4v_foo_07` and `_ZGVnN4l4v_foo_07`, which represent the
>    ones the compiler builds by auto-vectorizing `foo_07` according to
>    the rule defined in the Vector Function ABI specifications for
>    AArch64.
> 2.  `_ZGVnN4l4v_foo_07(vector_foo_07)`, which represents the
>    user-defined redirection of the 4-lane version of `foo_07` to the
>    custom implementation provided by the user when targeting Advanced
>    SIMD for version 8.2 of the A64 instruction set.
> 
> <!-- -->
> 
>    @llvm.compiler.used = appending global [2 x i8*] [i8* bitcast (<4 x float> (float *, <4 x i32>)* @_ZGVnN4l4v_foo_07 to i8*), i8* bitcast (<2 x float> (float *, <2 x i32>)* @_ZGVnN2l4v_foo_07 to i8*) ], section "llvm.metadata"
> 
>    define <4 x float> @_ZGVnN4l4v_foo_07(float *, <4 x i32>) {
>      ;; Compiler auto-vectorized version (via the VecClone pass)
>    }
> 
>    define <2 x float> @_ZGVnN2l4v_foo_07(float *, <2 x i32>) {
>      ;; Compiler auto-vectorized version (via the VecClone pass)
>    }
> 
>    define <4 x float> @vector_foo_07(float *, <4 x i32>) {
>      ;; user provided vector version
>    }
> 
>    define float @foo_07(float %a, i32 %b) {
>      ;; return *a + b :)
>    }
> 
>    // ... loop ...
>       %xi = call float @foo_07(float * %aiptr, i32 %bi) #0
> 
>    attribute #0 = {vector-function-abi-variant="_ZGVnN2l4v_foo_07,_ZGVnN4l4v_foo_07,_ZGVnN4l4v_foo_07(vector_foo_07)"}
> 
> In this case, the body of the functions `_ZGVnN4l4v_foo_07` and
> `_ZGVnN2l4v_foo_07` is auto-generated by the compiler, therefore we
> might as well avoid adding them to the `@llvm.compiler.used` intrinsics.
> I have left it there for consistency, let me know if you think that
> there is no real reasons for requiring it, I will remove it.
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev