[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.
Francesco Petrogalli via llvm-dev
llvm-dev at lists.llvm.org
Wed Nov 6 14:22:46 PST 2019
Dear all,
I have a WIP patch that require your attention, as I am in the process of replacing the TLI (TargetLibraryInfo) mappings with the IR attribute that we agreed to introduce here.
I’d appreciate if you could look at the patch at https://reviews.llvm.org/D67572
Thank you,
Francesco
> On Jun 28, 2019, at 3:16 PM, Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Dear all,
>
> I have updated the proposal with the changes that are required to be
> able to generate the vector function signature in the front-end instead
> of the back-end.
>
> I have updated the example, showcasing the use of the
> `llvm.compiler.used` intrinsics.
>
> I have also mentioned that the `SVFS` should be wrapped in an analysis
> pass. I haven't proposed a brand new pass because I suspect that there
> is already one that could handle the information of the SVFS. Please
> point me at such pass if it exists.
>
> I have also CCed Sumedh, who is working on the implementation of the
> SVFS described here.
>
> Kind regards,
>
> Francesco
>
> *** DRAFT OF THE PROPOSAL ***
>
> SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer.
> ================================================================================
>
> Because the users care about portability (across compilers, libraries
> and systems), I believe we have to base sour solution on a standard that
> describes the mapping from the scalar function to the vector function.
>
> Because OpenMP is standard and widely used, we should base our solution
> on the mechanisms that the standard provides, via the directives
> `declare simd` and `declare variant`, the latter when used in with the
> `simd` trait in the `construct` set.
>
> Please notice that:
>
> 1. The scope of the proposal is not implementing full support for
> `pragma omp declare variant`.
> 2. The scope of the proposal is not enabling the vectorizer to do new
> kind of vectorizations (e.g. RV-like vectorization described by
> Simon).
> 3. The proposal aims to be extendible wrt 1. and 2.
> 4. The IR attribute introduced in this proposal is equivalent to the
> one needed for the VecClone pass under development in
> https://reviews.llvm.org/D22792
>
> CLANG COMPONENTS
> ================
>
> A C function attribute, `clang_declare_simd_variant`, to attach to the
> scalar version. The attribute provides enough information to the
> compiler about the vector shape of the user defined function. The vector
> shapes handled by the attribute are those handled by the OpenMP standard
> via `declare simd` (and no more than that).
>
> 1. The function attribute handling in clang is crafted with the
> requirement that it will be possible to re-use the same components
> for the info generated by `declare variant` when used with a `simd`
> traits in the `construct` set.
> 2. The attribute allows orthogonality with the vectorization that is
> done via OpenMP: the user vector function is still exposed for
> vectorization when not using `-fopenmp-[simd]` once the
> `declare simd` and `declare variant` directive of OpenMP will be
> available in the front-end.
>
> C function attribute: `clang_declare_simd_variant`
> --------------------------------------------------
>
> The definition of this attribute has been crafted to match the semantics
> of `declare variant` for a `simd` construct described in OpenMP 5.0. I
> have added only the traits of the `device` set, `isa` and `arch`, which
> I believe are enough to cover for the use case of this proposal. If that
> is not the case, please provide an example, extending the attribute will
> be easy even once the current one is implemented.
>
> clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, <context selector clauses>})
>
> <variant-func-id>:= The name of a function variant that is a base language identifier, or,
> for C++, a template-id.
>
> <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>}
>
> <simdlen> := simdlen(<positive number>) | simdlen("scalable")
>
> <mask> := inbranch | notinbranch
>
> <optional simd clauses> := <linear clause>
> | <uniform clause>
> | <align clause> | {,<optional simd clauses>}
>
> <linear clause> := linear_ref(<var>,<step>)
> | linear_var(<var>, <step>)
> | linear_uval(<var>, <step>)
> | linear(<var>, <step>)
>
> <step> := <var> | <non zero number>
>
> <uniform clause> := uniform(<var>)
>
> <align clause> := align(<var>, <positive number>)
>
> <var> := Name of a parameter in the scalar function declaration/definition
>
> <non zero number> := ... | -2 | -1 | 1 | 2 | ...
>
> <positive number> := 1 | 2 | 3 | ...
>
> <context selector clauses> := {<isa>}{,} {<arch>}
>
> <isa> := isa(target-specific-value)
>
> <arch> := arch(target-specific-value)
>
> LLVM COMPONENTS:
> ================
>
> VectorFunctionShape class
> -------------------------
>
> The object `VectorFunctionShape` contains the information about the kind
> of vectorization available for an `llvm::CallInst`.
>
> The object `VectorFunctionShape` must contain the following information:
>
> 1. Vectorization Factor (or number or concurrent lanes executed by the
> SIMD version of the function). Encoded by unsigned integer.
> 2. Whether the vector function is requested for scalable vectorization,
> encoded by a boolean.
> 3. Information about masking / no masking, encoded by a boolean.
> 4. Information about the parameters, encoded in a container that
> carries objects of type `ParamaterType`, to describe features like
> `linear` and `uniform`. This parameter type can be extended to
> represent concepts that are not handled by OpenMP.
> 5. Vector ISA, used in the implementation of the vector function.
>
> The `VectorFunctionShape` class can be extended in the future to include
> new vectorization kinds (for example the RV-like vectorization of the
> Region Vectorizer), or to add more context information that might come
> from other uses of OpenMP `declare variant`, or to add new Vector
> Function ABIs not based on OpenMP. Such information can be retrieved by
> attributes that will be added to describe the `llvm::CallInst` instance.
>
> IR Attribute
> ------------
>
> We define a `vector-function-abi-variant` attribute that lists the
> mangled names produced via the mangling function of the Vector Function
> ABI rules.
>
> vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..."
>
> 1. Because we use only OpenMP `declare simd` vectorization, and because
> we require a vector Function ABI, we make this explicit in the name
> of the attribute.
> 2. Because the Vector Function ABIs encode all the information needed
> to know the vectorization shape of the vector function in the
> mangled names, we provide the mangled name via the attribute.
> 3. Function names redirection is specified by enclosing the name of the
> redirection in parenthesis, as in
> `abi_mangled_name_02(user_redirection)`.
>
> The IR attribute is used in conjunction with the vector function
> declarations or definitions that are available in the module. Each
> mangled name in the `vector-function-abi-attribute` is be associated to
> a correspondent declaration/definition in the module. Such definition is
> provided by the front-end. The vector function declaration or definition
> is passed as an argument to the `llvm.compiler.used` intrinsic to
> prevent the compiler from removing it from the module (for example when
> the OpenMP mapping mechanism is used via C header file).
>
> We decided to make the vector function signature explicit in IR by
> creating it with the front-end, because we have found some cases for
> which it is impossible to use the backend to reconstruct the vector
> function signature out of the Vector Function ABI mangled name and the
> signature of the scalar function. This is due to the fact that the
> layout of some C types is lost in the C-to-IR process.
>
> As an example, the following three types can not be distinguished at IR
> level, because all cases are mapped to `i64` in the signature of the
> function `foo`. In fact, according to the rules of the Vector Function
> ABI for AArch64, the three types, for a 2-lane vectorization factor,
> will map respectively to `<4 x int>`, `<2 x (pointer_to_the_struct)>`,
> and `<2 x i64>`.
>
> // Type 1
> typedef _Complex int S;
>
> // Type 2
> typedef struct x{
> int a;
> int b;
> } S;
>
> // Type 3
> typedef uint64_t S;
>
> S foo(S a, S b) {
> return ...;
> }
>
> On of the problems that was raised during the discussion around these
> three types was how we could make sure that the vectorizer is able to
> determine how to map the values used in the scalar functions invocation
> to the values that can be used in the vecgtor signature.
>
> I was thiking to store the information needed for this in the parameter
> attributes of the function, but I realised that just the size of the
> scalar parameter might be enough, therefore I don't think we need to add
> new attributes to handle this.
>
> I illustrate my reasoning with an example, in which we want to vectorize
> the "flattened" signature `i64 foo(i64, i64)` to a 2-lane vector
> function. All exmaples are done for Advanced SIMD, with no mask
> parameter.
>
> I won't discuss `Type 3` becasue the mapping from scalar parameters from
> vector parameters is trivial.
>
> In case of `Type 1`, we will see a 2-lane vector function associated to
> `foo` with signature `<4 x i32>(<4 x i32>, <4 x i32>)` (the knowledge of
> being a 2-lane vector function comes from the `<vlen>` token in the
> magled name, which is always present even in case of a used defined
> custom name).
>
> The size of the scalar parameter is 8, the size of the vector parameter
> is 16, therefore the fact that we are doing a 2-lane vectorization is
> enough to tell the vectorizer that the two instances of `i64` values
> needs to be mapped to the high and low half of the `<4 x i32>` type.
>
> In case of `Type 2`, what we have is a situation in which two objects of
> type `i64` (the scalar values) need to be mapped to two pointers, which
> are pointing to instances of the same size of the scalar size. This is
> enough information for the vectorizer to be able to generate the code
> that can do this properly. The case of `Type 2` is distinguishable from
> `Type 3` because of the use of pointers, and it is distinguishable from
> the `linear` case that use references (in which vectors of pointers are
> needed) because the token in the mangled name is different (`v` is used
> for vector parameters that the vectorizer must to pass by value, while
> the linear references use different tokens for vector parameters.).
>
> Query interface: Search Vector Function System (SVFS)
> -----------------------------------------------------
>
> An interface that can be queried by the LLVM components to understand
> whether or not a scalar function can be vectorized, and that retrieves
> the vector function to be used if such vector shape is available.
>
> 1. This component is going to be unrelated to OpenMP.
> 2. This component will use internally the IR attribute defined in the
> previous section, but it will not expose any aspect of the Vector
> Function ABI via its interface.
>
> The interface provides two methods.
>
> std::vector<VectorFunctionShape> SVFS::isFunctionVectorizable(llvm::CallInst * Call);
>
> llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, VectorFunctionShape Info);
>
> The first method is used to list all the vector shapes that available
> and attached to a scalar function. An empty results means that no vector
> versions are available.
>
> The second method retrieves the information needed to build a call to a
> vector function with a specific `VectorFunctionShape` info.
>
> The SVFS is wrapped in an analysis pass that can be retrieved in other
> passes.
>
> (SELF) ASSESSMENT ON EXTENDIBILITY
> ==================================
>
> 1. Extending the C function attribute `clang_declare_simd_variant` to
> new Vector Function ABIs that use OpenMP will be straightforward
> because the attribute is tight to such ABIs and OpenMP.
> 2. The C attribute `clang_declare_simd_variant` and the
> `declare variant` directive used for the `simd` trait will be
> sharing the internals in clang, so adding the OpenMP functionality
> for `simd` traits will be mostly handling the directive in the
> OpenMP parser. How this should be done is described in
> https://clang.llvm.org/docs/InternalsManual.html\#how-to-add-an-attribute
> 3. The IR attribute `vector-function-abi-variant` is not to be extended
> to represent other kind of vectorization other than those handled by
> `declare simd` and that are handled with a Vector Function ABI.
> 4. The IR attribute `vector-function-abi-variant` is not defined to be
> extended to represent the information of `declare variant` in its
> totality.
> 5. The IR attribute will not need to change when we will introduce non
> vector function ABI vectorization (RV-like, reductions...) or when
> we will decide to fully support `declare variant`. The information
> it carries will not need to be invalidated, but just extended with
> new attributes that will need to be handled by the
> `VectorFunctionShape` class, in a similar way the
> `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
> operates on individual attributes to describe an overall
> functionality.
> 6. The IR attribute is to be used also to provide vector function
> information via the `declare simd` directive of OpenMP (see Example
> 7 below).
>
> Examples
> ========
>
> Example 1
> ---------
>
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64.
>
> double foo_01(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2), notinbranch, isa("simd"));
>
> // Advanced SIMD version provided by the user via an external module
> float64x2_t vector_foo_01(float64x2_t VectorInput);
>
> // ... loop ...
> x[i] = foo_01(y[i])
>
> The resulting IR is:
>
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x double> (<2 x double>)* @_ZGVnN2v_foo_01 to i8*)], section "llvm.metadata"
>
> declare double @foo_01(double %in) #0
>
> declare <2 x double> @_ZGVnN2v_foo_01(<2 x double>)
>
> // ... loop ...
> %xi = call double @foo_01(double %yi) #0
>
> attribute #0 = {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}
>
> Example 2
> ---------
>
> Exposing an Advanced SIMD vector function when targeting Advanced SIMD
> in AArch64, but with the wrong signature. The user specifies a masked
> version of the function in the clauses of the attribute, the compiler
> throws an error suggesting the signature expected for `vector_foo_02.`
>
> double foo_02(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2), inbranch, isa("simd"));
>
> // Advanced SIMD version
> float64x2_t vector_foo_02(float64x2_t VectorInput);
> // (suggested) compiler error -> ^ Missing mask parameter of type `uint64x2_t`.
>
> Example 3
> ---------
>
> Targeting `sincos`-like signatures.
>
> void foo_03(double Input, double * Output) __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), notinbranch, linear(Output, 1), isa("simd"));
>
> // Advanced SIMD version
> void vector_foo_03(float64x2_t VectorInput, double * Output);
>
> // ... loop ...
> foo_03(x[i], y + i)
>
> The resulting IR is:
>
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void (<2 x double>, double *)* @_ZGVnN2vl8_foo_03 to i8*)], section "llvm.metadata"
>
> declare void @foo_03(double, double *) #0
>
> declare void @_ZGVnN2vl8_foo_03(<2 x double>, double *)
>
> ;; ... loop ...
> call void @foo_03(double %xi, double * %yiptr) #0
>
> attribute #0 = {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}
>
> Example 4
> ---------
>
> Scalable vectorization targeting SVE
>
> double foo_04(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_04", simdlen("scalable"), notinbranch, isa("sve"));
>
> // SVE version
> svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);
>
> // ... loop ...
> x[i] = foo_04(y[i])
>
> The IR generated is:
>
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<vscale 2 x double> (<vscale 2 x double>)* @_ZGVsMxv_foo_04 to i8*)], section "llvm.metadata"
>
> declare double @foo_04(double %in) #0
>
> declare <vscale 2 x double> @_ZGVnNxv_foo_04(<vscale 2 x double>)
>
> // ... loop ...
> %xi = call double @foo_04(double %yi) #0
>
> attribute #0 = {vector-abi-variant="_ZGVsMxv_foo_04(vector_foo_04)"}
>
> Example 5
> ---------
>
> Fixed length vectorization targeting SVE
>
> double foo_05(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4), inbranch, isa("sve"));
>
> // Fixed-length SVE version
> svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);
>
> The resulting IR is:
>
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<4 x double> (<4 x double>)* @_ZGVsM4v_foo_05 to i8*)], section "llvm.metadata"
>
> declare double @foo_05(double %in) #0
>
> declare <4 x double> @_ZGVnNxv_foo_05(<4 x double>)
>
> ;; ... loop ...
> %xi = call double @foo_05(double %yi) #0
>
> attribute #0 = {vector-abi-variant="_ZGVsM4v_foo_04(vector_foo_04)"}
>
> Example 6
> ---------
>
> This is an x86 example, equivalent to the one provided by Andrei
> Elovikow in
> http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
> rendering with ICC at https://godbolt.org/z/Of1NxZ
>
> float MyAdd(float* a, int b) __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), notinbranch, linear(a), arch("core_2nd_gen_avx"))
> {
> return *a + b;
> }
>
>
> __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);
>
> // ... loop ...
>
> x[i] = MyAdd(a+i, b[i]);
>
> The resulting IR is:
>
> @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<8 x float> (float *, <2 x i64>, <2 x i64>)* @_ZGVbN8l4v_MyAdd to i8*)], section "llvm.metadata"
>
> define float @MyAdd(float %a, i32 %b) {
> ;; return *a + b :)
> }
>
> define <8 x float> @_ZGVbN8l4v_MyAdd(float *, <2 x i64>, <2 x i64>)
>
> ;; ... loop ...
> %xi = call float @MyAdd(float * %aiptr, i32 ) #0
>
> attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}
>
> Note: the signature of `MyAddVec` uses `<2 x i64>` instead of
> `<4 x i32>`, as shown in https://godbolt.org/z/T4T8s3 (line 11). If we
> would have asked the back end to generate the signature of `MyAddVec` by
> looking at the signature of the scalar function and the `<vlen>=8` token
> in the mangled name in the attribute, we would have end up using
> `<8 x i32>` instead of two instanced of `<2 x i64>`, which would have
> been wrong.
>
> This is another example that demonstrate that we need to generate the
> vector function signatures in the front-end and not in the backend.
>
> Example 7: showing interaction with `declare simd`
> --------------------------------------------------
>
> #pragma omp declare simd linear(a) notinbranch
> float foo_07(float *a, int x) __attribute__(clang_declare_simd_variant(“vector_foo_07", simdlen(4), linear(a), notinbranch, arch("armv8.2-a+simd")) {
> return *a + x;
> }
>
> // Advanced SIMD version
> float32x4_t vector_foo_07(float *a, int32x4_t vx) {
> // Custom implementation.
> }
>
> // ... loop ...
>
> x[i] = foo_07(a+i, b[i]);
>
> The resulting IR attribute is made of three symbols:
>
> 1. `_ZGVnN2l4v_foo_07` and `_ZGVnN4l4v_foo_07`, which represent the
> ones the compiler builds by auto-vectorizing `foo_07` according to
> the rule defined in the Vector Function ABI specifications for
> AArch64.
> 2. `_ZGVnN4l4v_foo_07(vector_foo_07)`, which represents the
> user-defined redirection of the 4-lane version of `foo_07` to the
> custom implementation provided by the user when targeting Advanced
> SIMD for version 8.2 of the A64 instruction set.
>
> <!-- -->
>
> @llvm.compiler.used = appending global [2 x i8*] [i8* bitcast (<4 x float> (float *, <4 x i32>)* @_ZGVnN4l4v_foo_07 to i8*), i8* bitcast (<2 x float> (float *, <2 x i32>)* @_ZGVnN2l4v_foo_07 to i8*) ], section "llvm.metadata"
>
> define <4 x float> @_ZGVnN4l4v_foo_07(float *, <4 x i32>) {
> ;; Compiler auto-vectorized version (via the VecClone pass)
> }
>
> define <2 x float> @_ZGVnN2l4v_foo_07(float *, <2 x i32>) {
> ;; Compiler auto-vectorized version (via the VecClone pass)
> }
>
> define <4 x float> @vector_foo_07(float *, <4 x i32>) {
> ;; user provided vector version
> }
>
> define float @foo_07(float %a, i32 %b) {
> ;; return *a + b :)
> }
>
> // ... loop ...
> %xi = call float @foo_07(float * %aiptr, i32 %bi) #0
>
> attribute #0 = {vector-function-abi-variant="_ZGVnN2l4v_foo_07,_ZGVnN4l4v_foo_07,_ZGVnN4l4v_foo_07(vector_foo_07)"}
>
> In this case, the body of the functions `_ZGVnN4l4v_foo_07` and
> `_ZGVnN2l4v_foo_07` is auto-generated by the compiler, therefore we
> might as well avoid adding them to the `@llvm.compiler.used` intrinsics.
> I have left it there for consistency, let me know if you think that
> there is no real reasons for requiring it, I will remove it.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list