[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

Mon Jun 24 10:22:26 PDT 2019

That helps complex but not other structures that can be call-by-value.

Hideki

-----Original Message-----
From: David Greene [mailto:dag at cray.com] 
Sent: Monday, June 24, 2019 9:40 AM
To: Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org>
Cc: Doerfert, Johannes <jdoerfert at anl.gov>; Francesco Petrogalli <Francesco.Petrogalli at arm.com>; nd <nd at arm.com>; Andrea Bocci <andrea.bocci at cern.ch>; Clang Dev <cfe-dev at lists.llvm.org>; Saito, Hideki <hideki.saito at intel.com>; Alexey Bataev <a.bataev at hotmail.com>
Subject: Re: [llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

I have an RFC for first-class complex types in LLVM IR pending for some internal review.  I hope to post it soon.  That should help address this problem.  Then the vector function signature generation could stay in LLVM, if I'm understanding the issue correctly.

                     -David

Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> writes:

> Hi all - I am working with a colleague to provide an initial implementation of this.
>
> We encountered a problem when dealing with generating the vector signatures of functions that use complex data.
>
> In this proposal, we expect the SVFS component in the backed to 
> demangle the name of the function in the attribute to be able to 
> reconstruct the signature of the vector function from the scalar 
> function signature.
>
> In case of Complex data, this doesn’t seem to be possible, because the 
> information of “being a vector of 2 lanes” that is supposed to be 
> carried by the complex scalar is lost in the transformation the data 
> type in a “coerced” type.
>
> Consider these three types and the function `foo`:
>
> // Type 1
> typedef _Complex int S;
>
> // Type 2
> typedef struct x{
>  int a;
>  int b;
> } S;
>
> // Type 3
> typedef uint64_t S;
>
> S foo(S a, S b) {
>  return ...;
> }
>
> In all cases, the IR type of the parameters in `foo` is i64, therefore 
> is not possible to distinguish what C type generated the signature of 
> `foo`.
>
> I don’t know if this is going to be a problem for other architectures, 
> but this is definitely a problem on AArch64 where we need to be able 
> to generate the correct vector function signature for a specific
> simdlen(N) attached on `foo`. When simdlen(2), for type 1 the vector 
> type is <4 x i32>, for type 2 is <2 x i64*>, for type 3 is <2 x i64>.
>
> Therefore, I would like to propose a change to the RFC, which would 
> move the responsibility off generating the vector function signature 
> from LLVM to clang.
>
> In particular, (and this I believe has already been mentioned by 
> Johannes), we could use the @llvm.compiler.used intrinsic to mark 
> those declaration that needs to stay in the IR and not optimized away 
> OPT before reaching the vectorizer.
>
> In summary, the change would consist of:
>
> 1. Generate symbols declaration/definitions of the vector function 
> with the mangled name in the IR, and mark it with @llvm-compiler.used. 
> This could be done in CGOpenMPRuntime.cpp 2. Use the attribute 
> vector-abs-variant defined in this RFC to map scalar names to vector 
> ABI mangled name, and used the same redirection mechanism for the user 
> provided vector name.
> 3. Move the “vector function signature generation” from the SVFS in 
> LLVM to the openmp code generator of the clang frontend
>
> The SVFS query system would still work as in the current proposal. The 
> only difference is that the vector function signature would be given 
> by the frontend and not need to be recomputed.
>
> Here is an example of ho the IR would look like with this change:
>
> ```
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x i32> (<2 x i32>)* @f to i8*)], section "llvm.metadata"
>
> declare dso_local <2 x i32> @_ZGVnN2v_foo(<2 x i32> returned)
>
> declare i32 @foo(i32) #0
>
> ; other function definition, including the one provided by the user 
> `my_vector_foo` if the user provided a definition and not just the 
> declaration
>
> attribute #0 = 
> {vector-function-abi-variant=“_ZGVnN2v_foo(my_vector_foo)"}
>
> ```
>
> If the attribute @llvm.compiler.used is not suitable for this (I am 
> not aware of all implication of using it on a global symbol), maybe we 
> could come up with a intrinsics that does what we need (avoid deleting 
> declarations that are not used) and name it 
> @llvm.vector.function.used?
>
> Please let me know what you think, I will submit an updated proposal next week.
>
> Kind regards,
>
> Francesco
>
>> On Jun 17, 2019, at 7:05 AM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
>> 
>> I agree with Simon. This looks good conceptually. I have minor implementation comments but that can wait till the code reviews.
>> 
>> Sorry for the delay and thanks for working on this.
>> 
>> Get Outlook for Android
>> 
>> From: Simon Moll <moll at cs.uni-saarland.de>
>> Sent: Monday, June 17, 2019 10:02:58 AM
>> To: Francesco Petrogalli; LLVM Development List; Clang Dev
>> Cc: Renato Golin; Finkel, Hal J.; Andrea Bocci; Elovikov, Andrei;
> Alexey Bataev; Doerfert, Johannes; Saito, Hideki; Tian, Xinmin; nd; 
> Roman Lebedev; Philip Reames; Shawn Landden
>> Subject: Re: RFC: Interface user provided vector functions with the vectorizer.
>>  
>> Hi Francesco,
>> 
>> On 6/11/19 10:55 PM, Francesco Petrogalli wrote:
>> > Dear all,
>> >
>> > I have re-written the proposal for interfacing user provided vector 
>> > functions, originally posted in both llvm-dev and cfe-dev mailing
>> > list:
>> >
>> > "[RFC] Expose user provided vector function for auto-vectorization."
>> >
>> > The proposal looks quite different from the original submission, 
>> > therefore I took the liberty to start a new thread.
>> >
>> > The original thread generated some good discussion. In particular, 
>> > Simon Moll and Johannes Doerfert (CCed) have managed to provide 
>> > good arguments for the following claims:
>> >
>> > 1. The Vector Function ABI name mangling scheme of a target is not
>> >     enough to describe all uses cases of function vectorization that
>> >     the compiler might end up needing to support in the future.
>> I think the new name of the attribute makes this point clear.
>> > 2. `declare variant` needs to be handled properly at IR level, to be
>> >     able to give the compiler the full OpenMP context of the directive.
>> >
>> > This proposal addresses those two concerns and other (I believe) 
>> > minor concerns that have been raised in the previous thread.
>> >
>> > This proposal is provided with examples and a self assessment 
>> > around extendibility.
>> >
>> > I have CCed all the people that have participated in the discussion 
>> > so far, please let me know if you think I have missed anything of 
>> > what have been raised.
>> >
>> > Kind regards,
>> >
>> > Francesco
>> 
>> LGTM. Please add me as a reviewer for this when you post patches.
>> 
>> Thanks!
>> 
>> Simon
>> 
>> >
>> > *** DRAFT OF THE PROPOSAL ***
>> >
>> > # SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer.
>> >
>> > Because the users care about portability (across compilers, 
>> > libraries and systems), I believe we have to base sour solution on 
>> > a standard that describes the mapping from the scalar function to 
>> > the vector function.
>> >
>> > Because OpenMP is standard and widely used, we should base our 
>> > solution on the mechanisms that the standard provides, via the 
>> > directives `declare simd` and `declare variant`, the latter when 
>> > used in with the `simd` trait in the `construct` set.
>> >
>> > Please notice that:
>> >
>> > 1. The scope of the proposal is not implementing full support for
>> >     `pragma omp declare variant`.
>> > 2. The scope of the proposal is not enabling the vectorizer to do new
>> >     kind of vectorizations (e.g. RV-like vectorization described by
>> >     Simon).
>> > 3. The proposal aims to be extendible wrt 1. and 2.
>> > 4. The IR attribute introduced in this proposal is equivalent to the
>> >     one needed for the VecClone pass under development in
>> >     https://reviews.llvm.org/D22792> > # CLANG COMPONENTS
>> >
>> > A C function attribute, `clang_declare_simd_variant`, to attach to 
>> > the scalar version. The attribute provides enough information to 
>> > the compiler about the vector shape of the user defined function. 
>> > The vector shapes handled by the attribute are those handled by the 
>> > OpenMP standard via `declare simd` (and no more than that).
>> >
>> > 1. The function attribute handling in clang is crafted with the
>> >     requirement that it will be possible to re-use the same components
>> >     for the info generated by `declare variant` when used with a `simd`
>> >     traits in the `construct` set.
>> > 2. The attribute allows orthogonality with the vectorization that is
>> >     done via OpenMP: the user vector function is still exposed for
>> >     vectorization when not using `-fopenmp-[simd]` once the `declare
>> >     simd` and `declare variant` directive of OpenMP will be available
>> >     in the front-end.
>> >
>> > ## C function attribute: `clang_declare_simd_variant`
>> >
>> > The definition of this attribute has been crafted to match the 
>> > semantics of `declare variant` for a `simd` construct described in 
>> > OpenMP 5.0. I have added only the traits of the `device` set, `isa` 
>> > and `arch`, which I believe are enough to cover for the use case of 
>> > this proposal. If that is not the case, please provide an example, 
>> > extending the attribute will be easy even once the current one is 
>> > implemented.
>> >
>> > ```
>> > clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, 
>> > <context selector clauses>})
>> >
>> > <variant-func-id>:= The name of a function variant that is a base language identifier, or,
>> >                      for C++, a template-id.
>> >
>> > <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>}
>> >
>> > <simdlen> := simdlen(<positive number>) | simdlen("scalable")
>> >
>> > <mask>    := inbranch | notinbranch
>> >
>> > <optional simd clauses> := <linear clause>
>> >                           | <uniform clause>
>> >                           | <align clause>  | {,<optional simd 
>> > clauses>}
>> >
>> > <linear clause>  := linear_ref(<var>,<step>)
>> >                    | linear_var(<var>, <step>)
>> >                    | linear_uval(<var>, <step>)
>> >                    | linear(<var>, <step>)
>> >
>> > <step> := <var> | <non zero number>
>> >
>> > <uniform clause> := uniform(<var>)
>> >
>> > <align clause>   := align(<var>, <positive number>)
>> >
>> > <var> := Name of a parameter in the scalar function 
>> > declaration/definition
>> >
>> > <non zero number> := ... | -2 | -1 | 1 | 2 | ...
>> >
>> > <positive number> := 1 | 2 | 3 | ...
>> >
>> > <context selector clauses> := {<isa>}{,} {<arch>}
>> >
>> > <isa> := isa(target-specific-value)
>> >
>> > <arch> := arch(target-specific-value)
>> >
>> > ```
>> >
>> > # LLVM COMPONENTS:
>> >
>> > ## VectorFunctionShape class
>> >
>> > The object `VectorFunctionShape` contains the information about the 
>> > kind of vectorization available for an `llvm::Call`.
>> >
>> > The object `VectorFunctionShape` must contain the following information:
>> >
>> > 1. Vectorization Factor (or number or concurrent lanes executed by the
>> >     SIMD version of the function). Encoded by unsigned integer.
>> > 2. Whether the vector function is requested for scalable
>> >     vectorization, encoded by a boolean.
>> > 3. Information about masking / no masking, encoded by a boolean.
>> > 4. Information about the parameters, encoded in a container that
>> >     carries objects of type `ParamaterType`, to describe features like
>> >     `linear` and `uniform`.
>> > 5. Function name redirection, if a user has specified to use a custom
>> >     name instead of the Vector Function ABI ones.
>> >
>> > Items 1. to 5. represents the information stored in the 
>> > `vector-function-abi-variant` attribute (see next section).
>> >
>> > The object can be extended in the future to include new 
>> > vectorization kinds (for example the RV-like vectorization of the 
>> > Region Vectorizer), or to add more context information that might 
>> > come from other uses of OpenMP `declare variant`, or to add new 
>> > Vector Function ABIs not based on OpenMP. Such information can be 
>> > retrieved by attributes that will be added to describe the `Call` instance.
>> >
>> > ## IR Attribute
>> >
>> > We define a `vector-function-abi-variant` attribute that lists the 
>> > mangled names produced via the mangling function of the Vector 
>> > Function ABI rules.
>> >
>> > ```
>> > vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..."
>> > ```
>> >
>> > 1. Because we use only OpenMP `declare simd` vectorization, and
>> >     because we require a vector Function ABI, we make this explicit
>> >     in the name of the attribute.
>> > 2. Because the Vector Function ABIs encode all the information
>> >     needed to know the vectorization shape of the vector function in
>> >     the mangled names, we provide the mangled name via the
>> >     attribute.
>> > 3. Function names redirection is specified by enclosing the name of
>> >     the redirection in parenthesis, as in
>> >     `abi_mangled_name_02(user_redirection)`.
>> >
>> > ## Vector ABI Demangler
>> >
>> > The “Vector ABI demangler”, is the component that demangles the 
>> > data in the `vector-function-abi-variant` attribute and that 
>> > provides the instances of the class `VectorFunctionShape` that can 
>> > be derived by the mangled names listed in the attribute.
>> >
>> > ## Query interface: Search Vector Function System (SVFS)
>> >
>> > An interface that can be queried by the LLVM components to 
>> > understand whether or not a scalar function can be vectorized, and 
>> > that retrieves the vector function to be used if such vector shape is available.
>> >
>> > 1. This component is going to be unrelated to OpenMP.
>> > 2. This component will use internally the demangler defined in the
>> >     previous section, but it will not expose any aspect of the Vector
>> >     Function ABI via its interface.
>> >
>> > The interface provides two methods.
>> >
>> > ```
>> > std::vector<VectorFunctionShape> 
>> > SVFS::isFunctionVectorizable(llvm::CallInst * Call);
>> >
>> > llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, 
>> > VectorFunctionShape Info); ```
>> >
>> > The first method is used to list all the vector shapes that 
>> > available and attached to a scalar function. An empty results means 
>> > that no vector versions are available.
>> >
>> > The second method retrieves the information needed to build a call 
>> > to a vector function with a specific `VectorFunctionShape` info.
>> >
>> > # (SELF) ASSESSMENT ON EXTENDIBILITY
>> >
>> >
>> > 1. Extending the C function attribute `clang_declare_simd_variant` to
>> >     new Vector Function ABIs that use OpenMP will be straightforward
>> >     because the attribute is tight to such ABIs and OpenMP.
>> > 2. The C attribute `clang_declare_simd_variant` and the `declare
>> >     variant` directive used for the `simd` trait will be sharing the
>> >     internals in clang, so adding the OpenMP functionality for `simd`
>> >     traits will be mostly handling the directive in the OpenMP
>> >     parser. How this should be done is described in
>> >     https://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-attribute> > 3. The IR attribute `vector-function-abi-variant` is not to be
>> >     extended to represent other kind of vectorization other than those
>> >     handled by `declare simd` and that are handled with a Vector
>> >     Function ABI.
>> > 4. The IR attribute `vector-function-abi-variant` is not defined to be
>> >     extended to represent the information of `declare variant` in its
>> >     totality.
>> > 5. The IR attribute will not need to change when we will introduce non
>> >     vector function ABI vectorization (RV-like, reductions...) or when
>> >     we will decide to fully support `declare variant`. The information
>> >     it carries will not need to be invalidated, but just extended with
>> >     new attributes that will need to be handled by the
>> >     `VectorFunctionShape` class, in a similar way the
>> >     `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
>> >     operates on individual attributes to describe an overall
>> >     functionality.
>> >
>> > # Examples
>> >
>> > ## Example 1
>> >
>> > Exposing an Advanced SIMD vector function when targeting Advanced 
>> > SIMD in AArch64.
>> >
>> > ```
>> > double foo_01(double Input) 
>> > __attribute__(clang_declare_simd_variant(“vector_foo_01", 
>> > simdlen(2), notinbranch, isa("simd"));
>> >
>> > // Advanced SIMD version
>> > float64x2_t vector_foo_01(float64x2_t VectorInput); ```
>> >
>> > The resulting IR attribute is:
>> >
>> > ```
>> > attribute #0 = 
>> > {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}
>> > ```
>> >
>> > ## Example 2
>> >
>> > Exposing an Advanced SIMD vector function when targeting Advanced 
>> > SIMD in AArch64, but with the wrong signature. The user specifies a 
>> > masked version of the function in the clauses of the attribute, the 
>> > compiler throws an error suggesting the signature expected for 
>> > ``vector_foo_02.``
>> >
>> > ```
>> > double foo_02(double Input) 
>> > __attribute__(clang_declare_simd_variant(“vector_foo_02", 
>> > simdlen(2), inbranch, isa("simd"));
>> >
>> > // Advanced SIMD version
>> > float64x2_t vector_foo_02(float64x2_t VectorInput);
>> > // (suggested) compiler error ->                      ^ Missing mask parameter of type `uint64x2_t`.
>> > ```
>> >
>> > ## Example 3
>> >
>> > Targeting `sincos`-like signatures.
>> >
>> > ```
>> > void foo_03(double Input, double * Output)
> __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), 
> notinbranch, linear(Output, 1), isa("simd"));
>> >
>> > // Advanced SIMD version
>> > void vector_foo_03(float64x2_t VectorInput, double * Output); ```
>> >
>> > The resulting IR attribute is:
>> >
>> > ```
>> > attribute #0 = 
>> > {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}
>> > ```
>> > ## Example 4
>> >
>> > Scalable vectorization targeting SVE
>> >
>> > ```
>> > double foo_04(double Input)
> __attribute__(clang_declare_simd_variant(“vector_foo_04",
> simdlen("scalable"), notinbranch, isa("sve"));
>> >
>> > // SVE version
>> > svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask); 
>> > ```
>> >
>> > The resulting IR attribute is:
>> >
>> > ```
>> > attribute #0 = 
>> > {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"}
>> > ```
>> >
>> > ## Example 5
>> >
>> > Fixed length vectorization targeting SVE
>> >
>> > ```
>> > double foo_05(double Input) 
>> > __attribute__(clang_declare_simd_variant(“vector_foo_05", 
>> > simdlen(4), inbranch, isa("sve"));
>> >
>> > // Fixed-length SVE version
>> > svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask); 
>> > ```
>> >
>> > The resulting IR attribute is:
>> >
>> > ```
>> > attribute #0 = 
>> > {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"}
>> > ```
>> >
>> > ## Example 06
>> >
>> > This is an x86 example, equivalent to the one provided by Andrei 
>> > Elovikow in 
>> > http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. 
>> > Godbolt rendering with ICC at https://godbolt.org/z/Of1NxZ> > ``` 
>> > float MyAdd(float* a, int b)
> __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), 
> notinbranch, arch("core_2nd_gen_avx"))
>> > {
>> >    return *a + b;
>> > }
>> >
>> >
>> > __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2); ```
>> >
>> > The resulting IR attribute is:
>> >
>> > ```
>> > attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}
>> > ```
>> >
>> > ## Example showing interaction with `declare simd`
>> >
>> > ```
>> > #pragma omp declare simd linear(a) notinbranch float foo_06(float 
>> > *a, int x)
> __attribute__(clang_declare_simd_variant(“vector_foo_06", simdlen(4), 
> linear(a), notinbranch, arch("armv8.2-a+simd")) {
>> >      return *a + x;
>> > }
>> >
>> > // Advanced SIMD version
>> > float32x4_t vector_foo_06(float *a, int32x4_t vx) { // Custom 
>> > implementation.
>> > }
>> > ```
>> >
>> > The resulting IR attribute is made of three symbols:
>> >
>> > 1. `_ZGVnN2l4v_foo_06` and `_ZGVnN4l4v_foo_06`, which represent the
>> >     ones the compiler builds by auto-vectorizing `foo_06` according to
>> >     the rule defined in the Vector Function ABI specifications for
>> >     AArch64.
>> > 2. `_ZGVnN4l4v_foo_06(vector_foo_06)`, which represents the
>> >     user-defined redirection of the 4-lane version of `foo_06` to the
>> >     custom implementation provided by the user when targeting Advanced
>> >     SIMD for version 8.2 of the A64 instruction set.
>> >
>> > ```
>> > attribute #0 = 
>> > {vector-function-abi-variant="_ZGVnN2l4v_foo_06,_ZGVnN4l4v_foo_06,_
>> > ZGVnN4l4v_foo_06(vector_foo_06)"}
>> > ```
>> >
>> --
>> 
>> Simon Moll
>> Researcher / PhD Student
>> 
>> Compiler Design Lab (Prof. Hack)
>> Saarland University, Computer Science Building E1.3, Room 4.31
>> 
>> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 
>> 302-3065  : http://compilers.cs.uni-saarland.de/people/moll
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev