[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP

Tue Dec 11 19:47:10 PST 2018

Hi all, I have been asked to include the RFC into the email message.

Here it goes.

Kind regards,

Francesco

—————————————————————————————————————————————————————————

Introduction
============

This RFC encompass the proposal of replacing the current
`TargetLibraryInfo` (TLI) based implementation of the command line
`-fveclib` with an OpenMP based one.

With this change, `-fveclib` will maintain its current behavior in terms
of user experience, but the new implementation will additionally:

1.  Decouples the compiler front-end that knows about the availability
    of vectorized routines, from the back-end that knows how to make use
    of them.
2.  Enable support for a developer's own vector libraries without
    requiring changes to the compiler, via the new `-fveclib-include`
    command line option.
3.  Enables other frontends and languages to add scalar-to-vector
    function mappings as relevant for their own runtime libraries, etc.

The implementation of the proposal will consists of the following
components:

1.  [Changes in LLVM IR](#llvmIR) to provide information about the
    availability of vector math functions via metadata attached to an
    `llvm::CallInst`.
2.  [An infrastructure](#infrastructure) that can be queried to retrive
    information about the available vector functions associated to a
    `llvm::CallInst`.
3.  [Changes in the LoopVectorizer](#LV) to use the API to query the
    metadata.
4.  [Changes in clang](#mathdoth) to add the metadata in the IR via two
    mechanisms:

    1.  A custom `math.h` header file shipped with the compiler.
    2.  A user header file distributed with the library, to be used with
        the command line option `-fveclib-include`.

5.  [Changes in the clang driver](#driver) to translate `-fveclib` in a
    combination of flags that enable the generation of the
    library-specific flags needed to select the list of available vector
    functions specified in any of the header files.

Current status of `-fveclib`
============================

User interface
--------------

At the moment, a user can invoke `-fveclib` to generate vector calls
from two libraries, SVML and Accelerate, as follows:

    $> clang -fveclib=[SVML|Accelerate]

Interface with the loop vectorizer
----------------------------------

The TLI exposes an interface that enables querying the list of available
mappings by scalar name and number of lanes needed. The TLI interface is
currently used by the InnerLoopVectorizer to plant vector calls in
auto-vectorized loops.

Extending `-fveclib`
--------------------

Adding new libraries require listing the mapping in
`<llvm>/lib/Analysis/TargetLibraryInfo.cpp`, plus modifying the clang
front-end to handle the new value for the option - see for example the
two patches to add SLEEF (<http://sleef.org>) as a target library for
AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and
<https://reviews.llvm.org/D53928> (clang code-base).

Limitations of the current implementation
-----------------------------------------

The mapping between scalar to vector version of a function is defined by
the backend, within the TLI specifically. For this reason the frontend's
-fveclib option is tied to the backend's support for the, often language
dependent, library. In particular, an IR file that is generated with a
version of clang that knows about the availability of library `X`, needs
to be processed by a backend end that also needs to know about the
availability of library `X`.

Proposed changes
================

We propose an implementation of `-fveclib` that makes uses of a *veclib
specific* pragma that is based on the OpenMP `declare simd` and
`declare variant` mechanism to inform the backend components about the
availability of vector version of scalar functions found in IR. The
mechanism relies in storing such information in IR metadata, and
therefore makes the auto-vectorization of function calls a mid-end
(`opt`) process that is independent on the front-end that generated such
IR metadata.

Moreover, this implementation enhances the extendibility and portability
of `-fveclib` to other libraries and front-ends, and it provides a
generic mechanism that the users of the LLVM compiler will be able to
use for interfacing their own vector routines for generic code.

The proposed implementation can also be used to expose
vectorization-specific descriptors -- for example, like the `linear` and
`uniform` clauses of the OpenMP `declare simd` directive -- that could
be used to finely tune the automatic vectorization of some functions
(think for example the vectorization of
`double sincos(double , double *, double *)`, where `linear` can be used
to give extra information about the memory layout of the 2 pointers
parameters in the vector version).

The new proposed `#pragma` directive are:

1.  `#pragma veclib declare simd`.
2.  `#pragma veclib declare variant`.

Both directive follows the syntax of the `declare simd` and the
`declare variant` directives of OpenMP, with the exception that
`declare variant` is used only for the `simd` context.

We define a new `veclib`-only directive instead of using the `omp` ones
of OpenMP for the following reasons:

1.  Allow the compiler to perform auto-vectorization outside of an
    OpenMP SIMD context.
2.  Allow library vendors to provide standard mechanism, based on
    OpenMP, to inform the compiler about the availability of vector
    functions that can be used for auto-vectorization.

A new compiler option, `-fparse-veclib`, is added to clang to enable
parsing of the `veclib` directive outside an OpenMP context.

OpenMP compatibility
--------------------

Note that the `veclib` pragma can be converted to the standard OpenMP
one by the following pre-processor test.

    #ifdef _OPENMP
    #define veclib omp
    #endif

Notice also that the `veclib simd` and `veclib variant` directive can be
parsed with the same infrastructure used for the OpenMP correspondents.

In the following RFC, we will describe how the compiler behaves when
parsing a `veclib` pragma. The same behavior is obtained when parsing
the OpenMP based one when the compiler is invoked with the comman line
options that enable OpenMP (`-fopenmp[-simd]`).

Changes in LLVM IR {#llvmIR}
------------------

The IR is enriched with metadata that details the availability of vector
versions of an associated scalar function. This metadata is attached to
the call site of the scalar function.

The metadata takes the form of an attribute containing a comma separated
list of vector function mappings. Each entry has a unique name that
follows the Vector Function ABI[^1] and real name that is used when
generating calls to this vector function.

    vfunc_name1(real_name1), vfunc_name2(real_name2)

The Vector Function ABI name describes the signature of the vector
function so that properties like vectorisation factor can be queried
during compilation.

The real name is optional and assumed to match the vector function ABI
name when omitted.

For example, the availability of a 2-lane double precision `sin`
function via SVML when targeting AVX on x86 is provided by the following
IR.

    // ...
    ... = call double @sin(double) #0
    // ...

    #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
                              _ZGVdN4v_sin(__svml_sin4),
                              ..."} }

The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
attribute provides information on the shape of the vector function via
the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI
for Intel, and remaps the standard Vector Function ABI name to the
non-standard name `__svml_sin2`.

This metadata is compatible with the proposal "Proposal for function
vectorization and loop vectorization with function calls",[^2] that uses
Vector Function ABI mangled names to inform the vectorizer about the
availability of vector functions. The proposal extends the original by
allowing the explicit mapping of the Vector Function ABI mangled name to
a non-standard name, which allows the use of existing vector libraries.

The `vector-variant` attribute needs to be attached on a per-call basis
to avoid conflicts when merging modules with different vector variants.

The query infrastructure: SVFS {#infrastructure}
------------------------------

The Search Vector Function System (SVFS) is constructed from an
`llvm::Module` instance so it can create function definitions. The SVFS
exposes an API with two methods.

### `SVFS::isFunctionVectorizable`

This method queries the avilability of a vectorized version of a
function. The signature of the method is as follows.

    bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params);

The method determine the availability of vector version of the function
invoked by the `Call` parameter by looking at the `vector-variant`
metadata.

The `Params` argument is a set mapping the position of a parameter in
the CallInst to its `ParameterType` descriptor. The `ParameterType`
descriptor holds information about the shape of the correspondend
parameter in the signature of the vector function. This `ParamaterType`
is used to query the SVMS about the availability of vector version that
have `linear` or `uniform` parameters (in the sense of OpenMP 4.0 and
onwards).

The method we propose, when invoked with an empty `ParTypeSet`, is
equivalent to the `TargetLibraryInfo` method
`isFunctionVectorizable(StrinRef Name)`

### `SVFS::getVectorizedFunction`

This method returns the vector function declaration that correspond to
the needs of the vectorization technique that is being run.

The signature of the function is as follows.

    std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
      llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);

The `Call` parameter is the call instance that is being vectorized, the
`VF` parameter represent the vectorization factor (how many lanes), the
`IsMasked` parameter decides whether or not the signature of the vector
function is required to have a mask parameter, the `Params` parameter
describes the shape of the vector function as in the
`isFunctionVectorizable` method.

The methods uses the `vector-variant` metadata and returns the function
signature and the name of the function based on the input parameters.

The SVFS can add new function definitions, in the same module as the
`Call`, to provide vector functions that are not present within the
vector-variant metadata. For example, if a library provides a vector
version of a function with a vectorization factor of 2, but the
vectorizer is requesting a vectorization factor of 4, the SVFS is
allowed to create a definition that calls the 2-lane version (provided
by the library) twice. This capability applies similarly for providing
masked and unmasked versions when the request doesn't match what is
available in the library.

This method is equivalent to the TLI method
`StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.

Notice that to fully support OpenMP vectorization we need to think about
a fuzzy matching mechanism that is able to select a candidate in the
calling context. However, this is not needed for `-fveclib` because the
scalar-to-vector mappings of `-fveclib` are such that for every scalar
function there is only one possible vector function associated.
Therefore, extending this behavior to a generic one is an aspect of the
implementation that will be treated in a separate RFC about the
vectorization pass.

### Scalable vectorization

Both methods of the SVFS API will be extended with a boolean parameter
to specify whether scalable signatures are needed by the user of the
SVFS.

Changes in the LoopVectorizer {#LV}
-----------------------------

The LoopVectorizer and the related analysis passes will have to replace
the TLI version of `isFunctionVectorizable` and `getVectorizedFunction`
with the SVFS ones.

Changes in clang: shipping `math.h` with the compiler {#mathdoth}
-----------------------------------------------------

We use clang to generate the metadata described above. The functions
available in library `X` are listed in a custom `math.h` file that is
shipped with the compiler in `<clang>/lib/Headers/math.h`. The header
file is implemented by including "once" the system `math.h` file,
followed by `#ifdef` guarded re-declarations of the functions enriched
with `#pragma veclib declare simd` directives.

    #include_once <math.h>

    // ... cpp extern "C" guards omitted

    #ifdef _CLANG_USE_LIBRARY_X
    #pragma veclib declare simd simdlen(4) notinbranch
    extern double sin(double);
    #endif

This generates the vector Function ABI mangled name to be used in the
`vector-variant` attribute, for example `_ZGVcN2v_sin`, when targeting
AVX code generation.

The part of the vector-variant attribute that redirects the call to
`__svml_sin2` is also added via the header file `math.h`, by using the
OpenMP 5.0 directive `declare variant`,[^3] guarded by SVML specific
preprocessor macros:

    #ifdef _CLANG_USE_SVML
    #pragma veclib declare simd simdlen(4) notinbranch
    extern double sin(double);

    #pragma veclib declare variant(double sin(double)) \
    match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)})
    __m256d __svml_sin4(__m256d x);
    #endif

Note that the list of if-guarded function declaration do not need to
leave in the same `math.h` file, but can be included in `math.h` from
library-specific header files.

Changes in the clang driver {#driver}
---------------------------

To enable the information provided via `math.h`, the clang driver will
translate the `-fveclib=X` option into `-D_CLANG_USE_LIBRARY_X -lX` to
turn on the correct section of the header file and the flag for the
linker.

Note that the `veclib` directives are loaded even when *not* compiling
for an OpenMP target.

Extending auto-vectorization capabilities of LLVM
=================================================

When compared to the TLI-based auto-vectorization mechanism, the
OpenMP-based mechanism has the advantage of enabling users to provide
their own vector routines (not just the math ones) by adding
`veclib declare simd` and `veclib declare variant` definitions in their
source.

For this specific functionality, the following command line option is
added to clang:

    -fveclib-include=path/to/header/file.h

This options enable clang to recognize the `veclib declare simd` and
`veclib declare variant` directive listed in the library of the header
file.

Summary
=======

New `veclib` directives in clang
--------------------------------

1.  `#pragma veclib declare simd [clause, ]`, same as
    `#pragma omp declare simd` from OpenMP 4.0+.
2.  `#pragma omp declare variant`, same as `#pragma omp declare variant`
    restricted to the `simd` context selector, from OpenMP 5.0+.

New `math.h` header file
------------------------

Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of
the functions available in the vector library `X`, `ifdef` guarded by
the macro `__CLANG_ENABLE_LIBRARY_X`.

Option behavior, and interaction with OpenMP
--------------------------------------------

The behavior described below makes sure that \`-fveclib\`\` function
vectorization and OpenMP function vectorization are orthogonal.

No options

:   No function vectorization via vector library, neither user provided
    or shipped via an internal `math.h`.

`-fveclib=X`

:   The driver transform this into
    `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only
    for users that want to vectorize `math.h` functions.

`-fveclib-include=path/to/user/provided/header/file.h`

:   The driver transform this into
    `-fparse-veclib -include=path/to/user/provided/header/file.h`. The
    user has to provide the correct linker flag for both the scalar
    version and the vector version of whatever function they have
    defined in the header file. The header file must use the `veclib`
    directive to inform the compiler about the available vector
    functions.

`-fopenmp[-simd]`

:   No vectorization happens other then for those functions that are
    marked with OpenMP declare simd. The header `math.h` is loaded, but
    the `veclib` decorated declarations are invisible to the compiler
    instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X`
    macros, which are not defined.

`-fopenmp[-simd] -fveclib=X` or
`-fopenmp[-simd] -fveclib-include=path/to/user/provided/header/file.h`

: Same behavior as without the `-fopenmp[-simd]` option.

[^1]: Vector Funcion ABI for x86:
    <https://software.intel.com/en-us/articles/vector-simd-function-abi>.
    Vector Function ABI for AArch64:
    https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

[^2]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>

[^3]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>

> On Nov 29, 2018, at 11:26 PM, Francesco Petrogalli <Francesco.Petrogalli at arm.com> wrote:
> 
> Hi all,
> 
> I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2].
> 
> The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki Saito (thank you!).
> 
> Kind regards,
> 
> Francesco
> 
> [1] https://reviews.llvm.org/D54412
> [2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7