[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Thu May 30 09:05:27 PDT 2019

On 05/29, Finkel, Hal J. via cfe-dev wrote:
> On 5/29/19 1:52 PM, Philip Reames wrote:
> > On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
> >>> I generally like the idea of having support in IR for vectorization of
> >>> custom functions.  I have several use cases which would benefit from this.
> >>>
> >>> I'd suggest a couple of reframings to the IR representation though.
> >>>
> >>> First, this should probably be specified as metadata/attribute on a
> >>> function declaration.  Allowing the callsite variant is fine, but it
> >>> should primarily be a property of the called function, not of the call
> >>> site.  Being able to specify it once per declaration is much cleaner.
> >> I agree. We should support this both on the function declaration and on
> >> the call sites.
> >>
> >>
> >>> Second, I really don't like the mangling use here.  We need a better way
> >>> to specify the properties of the function then it's mangled name.  One
> >>> thought to explore is to directly use the Value of the function
> >>> declaration (since this is metadata and we can do that), and then tie
> >>> the properties to the function declaration in some way?  Sorry, I don't
> >>> really have a specific suggestion here.
> >> Is the problem the mangling or the fact that the mangling is
> >> ABI/target-specific? One option is to use LLVM's mangling scheme (the
> >> one we use for intrinsics) and then provide some backend infrastructure
> >> to translate later.
> > Well, both honestly.  But mangling with a non-target specific scheme is
> > a lot better, so I might be okay with that.   Good idea.
> 
> 
> I liked your idea of directly encoding the signature in the metadata, 
> but I think that we want to continue to use attributes, and not 
> metadata, and the options for attributes seem more limited - unless we 
> allow attributes to take metadata arguments - maybe that's an 
> enhancement worth considering.

I recently talked to people in the OpenMP language committee meeting
about this and, thinking forward to the actual implementation/use of the
OpenMP 5.x declare variant feature, I'd say:

  - We will need a mangling scheme if we want to allow variants on
    declarations that are defined elsewhere.
  - We will need a (OpenMP) standardized mangling scheme if we want
    interoperability between compilers.

I assume we want both so I think we will need both.

That said, I think this should allow us to avoid attributes/metadata
which seems to me like a good thing right now.

Cheers,
  Johannes

> >>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:
> >>>> Dear all,
> >>>>
> >>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions.
> >>>>
> >>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned.
> >>>>
> >>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM.
> >>>>
> >>>> Please let me know what you think.
> >>>>
> >>>> Kind regards,
> >>>>
> >>>> Francesco
> >>>>
> >>>>
> >>>> =================================================================================
> >>>>
> >>>> Introduction
> >>>> ============
> >>>>
> >>>> This RFC encompasses the proposal of informing the vectorizer about the
> >>>> availability of vector functions provided by the user. The mechanism is
> >>>> based on the use of the directive `declare variant` introduced in OpenMP
> >>>> 5.0 [^1].
> >>>>
> >>>> The mechanism proposed has the following properties:
> >>>>
> >>>> 1.  Decouples the compiler front-end that knows about the availability
> >>>>       of vectorized routines, from the back-end that knows how to make use
> >>>>       of them.
> >>>> 2.  Enable support for a developer's own vector libraries without
> >>>>       requiring changes to the compiler.
> >>>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector function
> >>>>       mappings as relevant for their own runtime libraries, etc.
> >>>>
> >>>> The implemetation consists of two separate sets of changes.
> >>>>
> >>>> The first set is a set o changes in `llvm`, and consists of:
> >>>>
> >>>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about the
> >>>>       availability of user-defined vector functions via metadata attached
> >>>>       to an `llvm::CallInst`.
> >>>> 2.  [An infrastructure](#infrastructure) that can be queried to retrive
> >>>>       information about the available vector functions associated to a
> >>>>       `llvm::CallInst`.
> >>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to query the
> >>>>       metadata.
> >>>>
> >>>> The second set consists of the changes [changes in clang](#clang) that
> >>>> are needed too to recognize the `#pragma clang declare variant`
> >>>> directive.
> >>>>
> >>>> Proposed changes
> >>>> ================
> >>>>
> >>>> We propose an implementation that uses `#pragma clang declare variant`
> >>>> to inform the backend components about the availability of vector
> >>>> version of scalar functions found in IR. The mechanism relies in storing
> >>>> such information in IR metadata, and therefore makes the
> >>>> auto-vectorization of function calls a mid-end (`opt`) process that is
> >>>> independent on the front-end that generated such IR metadata.
> >>>>
> >>>> This implementation provides a generic mechanism that the users of the
> >>>> LLVM compiler will be able to use for interfacing their own vector
> >>>> routines for generic code.
> >>>>
> >>>> The implementation can also expose vectorization-specific descriptors --
> >>>> for example, like the `linear` and `uniform` clauses of the OpenMP
> >>>> `declare simd` directive -- that could be used to finely tune the
> >>>> automatic vectorization of some functions (think for example the
> >>>> vectorization of `double sincos(double , double *, double *)`, where
> >>>> `linear` can be used to give extra information about the memory layout
> >>>> of the 2 pointers parameters in the vector version).
> >>>>
> >>>> The directive `#pragma clang declare variant` follows the syntax of the
> >>>> `#pragma omp declare variant` directive of OpenMP.
> >>>>
> >>>> We define the new directive in the `clang` namespace instead of using
> >>>> the `omp` one of OpenMP to allow the compiler to perform
> >>>> auto-vectorization outside of an OpenMP SIMD context.
> >>>>
> >>>> The mechanism is base on OpenMP to provide a uniform user experience
> >>>> across the two mechanism, and to maximise the number of shared
> >>>> components of the infrastructure needed in the compiler frontend to
> >>>> enable the feature.
> >>>>
> >>>> Changes in LLVM IR {#llvmIR}
> >>>> ------------------
> >>>>
> >>>> The IR is enriched with metadata that details the availability of vector
> >>>> versions of an associated scalar function. This metadata is attached to
> >>>> the call site of the scalar function.
> >>>>
> >>>> The metadata takes the form of an attribute containing a comma separated
> >>>> list of vector function mappings. Each entry has a unique name that
> >>>> follows the Vector Function ABI[^2] and real name that is used when
> >>>> generating calls to this vector function.
> >>>>
> >>>>       vfunc_name1(real_name1), vfunc_name2(real_name2)
> >>>>
> >>>> The Vector Function ABI name describes the signature of the vector
> >>>> function so that properties like vectorisation factor can be queried
> >>>> during compilation.
> >>>>
> >>>> The `(real name)` token is optional and assumed to match the Vector
> >>>> Function ABI name when omitted.
> >>>>
> >>>> For example, the availability of a 2-lane double precision `sin`
> >>>> function via SVML when targeting AVX on x86 is provided by the following
> >>>> IR.
> >>>>
> >>>>       // ...
> >>>>       ... = call double @sin(double) #0
> >>>>       // ...
> >>>>
> >>>>       #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
> >>>>                                 _ZGVdN4v_sin(__svml_sin4),
> >>>>                                 ..."} }
> >>>>
> >>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
> >>>> attribute provides information on the shape of the vector function via
> >>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI
> >>>> for Intel, and remaps the standard Vector Function ABI name to the
> >>>> non-standard name `__svml_sin2`.
> >>>>
> >>>> This metadata is compatible with the proposal "Proposal for function
> >>>> vectorization and loop vectorization with function calls",[^3] that uses
> >>>> Vector Function ABI mangled names to inform the vectorizer about the
> >>>> availability of vector functions. The proposal extends the original by
> >>>> allowing the explicit mapping of the Vector Function ABI mangled name to
> >>>> a non-standard name, which allows the use of existing vector libraries.
> >>>>
> >>>> The `vector-variant` attribute needs to be attached on a per-call basis
> >>>> to avoid conflicts when merging modules with different vector variants.
> >>>>
> >>>> The query infrastructure: SVFS {#infrastructure}
> >>>> ------------------------------
> >>>>
> >>>> The Search Vector Function System (SVFS) is constructed from an
> >>>> `llvm::Module` instance so it can create function definitions. The SVFS
> >>>> exposes an API with two methods.
> >>>>
> >>>> ### `SVFS::isFunctionVectorizable`
> >>>>
> >>>> This method queries the avilability of a vectorized version of a
> >>>> function. The signature of the method is as follows.
> >>>>
> >>>>       bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
> >>>>
> >>>> The method determine the availability of vector version of the function
> >>>> invoked by the `Call` parameter by looking at the `vector-variant`
> >>>> metadata.
> >>>>
> >>>> The `Params` argument is a map that associates the position of a
> >>>> parameter in the `CallInst` to its `ParameterType` descriptor. The
> >>>> `ParameterType` descriptor holds information about the shape of the
> >>>> correspondend parameter in the signature of the vector function. This
> >>>> `ParamaterType` is used to query the SVMS about the availability of
> >>>> vector version that have `linear`, `uniform` or `align` parameters (in
> >>>> the sense of OpenMP 4.0 and onwards).
> >>>>
> >>>> The method `isFunctionVectorizable`, when invoked with an empty
> >>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method
> >>>> `isFunctionVectorizable(StrinRef Name)`.
> >>>>
> >>>> ### `SVFS::getVectorizedFunction`
> >>>>
> >>>> This method returns the vector function declaration that correspond to
> >>>> the needs of the vectorization technique that is being run.
> >>>>
> >>>> The signature of the function is as follows.
> >>>>
> >>>>       std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
> >>>>         llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);
> >>>>
> >>>> The `Call` parameter is the call instance that is being vectorized, the
> >>>> `VF` parameter represent the vectorization factor (how many lanes), the
> >>>> `IsMasked` parameter decides whether or not the signature of the vector
> >>>> function is required to have a mask parameter, the `Params` parameter
> >>>> describes the shape of the vector function as in the
> >>>> `isFunctionVectorizable` method.
> >>>>
> >>>> The methods uses the `vector-variant` metadata and returns the function
> >>>> signature and the name of the function based on the input parameters.
> >>>>
> >>>> The SVFS can add new function definitions, in the same module as the
> >>>> `Call`, to provide vector functions that are not present within the
> >>>> vector-variant metadata. For example, if a library provides a vector
> >>>> version of a function with a vectorization factor of 2, but the
> >>>> vectorizer is requesting a vectorization factor of 4, the SVFS is
> >>>> allowed to create a definition that calls the 2-lane version twice. This
> >>>> capability applies similarly for providing masked and unmasked versions
> >>>> when the request does not match what is available in the library.
> >>>>
> >>>> This method is equivalent to the TLI method
> >>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.
> >>>>
> >>>> Notice that to fully support OpenMP vectorization we need to think about
> >>>> a fuzzy matching mechanism that is able to select a candidate in the
> >>>> calling context. However, this proposal is intended for scalar-to-vector
> >>>> mappings of math-like functions that are most likely to associate a
> >>>> unique vector candidate in most contexts. Therefore, extending this
> >>>> behavior to a generic one is an aspect of the implementation that will
> >>>> be treated in a separate RFC about the vectorization pass.
> >>>>
> >>>> ### Scalable vectorization
> >>>>
> >>>> Both methods of the SVFS API will be extended with a boolean parameter
> >>>> to specify whether scalable signatures are needed by the user of the
> >>>> SVFS.
> >>>>
> >>>> Changes in clang {#clang}
> >>>> ----------------
> >>>>
> >>>> We use clang to generate the metadata described above.
> >>>>
> >>>> In the compilation unit, the vector function definition or declaration
> >>>> must be visible and associated to the scalar version via the
> >>>> `#pragma clang declare variant` according to the rule defined by the
> >>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in
> >>>> the following example.
> >>>>
> >>>>       #pragma clang declare variant(vector_sinf) \
> >>>>       match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>       extern float sinf(float);
> >>>>
> >>>>       float32x4_t vector_sinf(float32x4_t x);
> >>>>
> >>>> The `construct` set in the directive, together with the `device` set, is
> >>>> used to generate the vector mangled name to be used in the
> >>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting
> >>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of
> >>>> the scalar function in the vector name are defined in the the Vector
> >>>> Function ABI specification of the target.
> >>>>
> >>>> The part of the vector-variant attribute that redirects the call to
> >>>> `vector_sinf` is derived from the `variant-id` specified in the
> >>>> `variant` clause.
> >>>>
> >>>> Summary
> >>>> =======
> >>>>
> >>>> New `clang` directive in clang
> >>>> ------------------------------
> >>>>
> >>>> `#pragma omp declare variant`, same as `#pragma omp declare variant`
> >>>> restricted to the `simd` context selector, from OpenMP 5.0+.
> >>>>
> >>>> Option behavior, and interaction with OpenMP
> >>>> --------------------------------------------
> >>>>
> >>>> The behavior described below makes sure that
> >>>> `#pragma cland declare variant` function vectorization and OpenMP
> >>>> function vectorization are orthogonal.
> >>>>
> >>>> `-fclang-declare-variant`
> >>>>
> >>>> :   The `#pragma clang declare variant` directives are parsed and used
> >>>>       to populate the `vector-variant` attribute.
> >>>>
> >>>> `-fopenmp[-simd]`
> >>>>
> >>>> :   The `#pragma omp declare variant` directives are parsed and used to
> >>>>       populate the `vector-variant` attribute.
> >>>>
> >>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
> >>>>
> >>>> :   The directive `#pragma omp declare variant` is used to populate the
> >>>>       `vector-variant` attribute in IR. The directive
> >>>>       `#pragma   clang declare variant` are ignored.
> >>>>
> >>>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
> >>>>
> >>>> [^2]: Vector Function ABI for x86:
> >>>>       <https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>       Vector Function ABI for AArch64:
> >>>>       https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>
> >>>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
> >>>>
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> llvm-dev at lists.llvm.org
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>> _______________________________________________
> >>> cfe-dev mailing list
> >>> cfe-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190530/2a1e1420/attachment.sig>