<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>

<div dir="auto" style="direction:ltr; margin:0; padding:0; font-family:sans-serif; font-size:11pt; color:black">

I think that a standardized naming scheme is needed and that it solves the problem motivating the RFC without the need for attributes or metadata.<br>

<br>

</div>

<div dir="auto" style="direction:ltr; margin:0; padding:0; font-family:sans-serif; font-size:11pt; color:black">

If we want to use a vectorized version at a call site we know what the symbol is supposed to look like and we can check if it's available.<br>

<br>

</div>

<div dir="auto" style="direction:ltr; margin:0; padding:0; font-family:sans-serif; font-size:11pt; color:black">

Maybe I misunderstood the problem people want to solve here but the way I see it the above is all we need.<br>

<br>

</div>

<div dir="auto" style="direction:ltr; margin:0; padding:0; font-family:sans-serif; font-size:11pt; color:black">

<span id="x_OutlookSignature">

<div dir="auto" style="direction:ltr; margin:0; padding:0; font-family:sans-serif; font-size:11pt; color:black">

Get <a href="https://aka.ms/ghei36">Outlook for Android</a></div>

</span><br>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Philip Reames <listmail@philipreames.com><br>

<b>Sent:</b> Thursday, May 30, 2019 12:53:02 PM<br>

<b>To:</b> Doerfert, Johannes; Finkel, Hal J.<br>

<b>Cc:</b> Francesco Petrogalli; LLVM Development List; nd; Hideki Saito; Clang Dev; scogland1@llnl.gov<br>

<b>Subject:</b> Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for auto-vectorization.</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:11pt;">

<div class="PlainText"><br>

On 5/30/19 9:05 AM, Doerfert, Johannes wrote:<br>

> On 05/29, Finkel, Hal J. via cfe-dev wrote:<br>

>> On 5/29/19 1:52 PM, Philip Reames wrote:<br>

>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:<br>

>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:<br>

>>>>> I generally like the idea of having support in IR for vectorization of<br>

>>>>> custom functions.  I have several use cases which would benefit from this.<br>

>>>>><br>

>>>>> I'd suggest a couple of reframings to the IR representation though.<br>

>>>>><br>

>>>>> First, this should probably be specified as metadata/attribute on a<br>

>>>>> function declaration.  Allowing the callsite variant is fine, but it<br>

>>>>> should primarily be a property of the called function, not of the call<br>

>>>>> site.  Being able to specify it once per declaration is much cleaner.<br>

>>>> I agree. We should support this both on the function declaration and on<br>

>>>> the call sites.<br>

>>>><br>

>>>><br>

>>>>> Second, I really don't like the mangling use here.  We need a better way<br>

>>>>> to specify the properties of the function then it's mangled name.  One<br>

>>>>> thought to explore is to directly use the Value of the function<br>

>>>>> declaration (since this is metadata and we can do that), and then tie<br>

>>>>> the properties to the function declaration in some way?  Sorry, I don't<br>

>>>>> really have a specific suggestion here.<br>

>>>> Is the problem the mangling or the fact that the mangling is<br>

>>>> ABI/target-specific? One option is to use LLVM's mangling scheme (the<br>

>>>> one we use for intrinsics) and then provide some backend infrastructure<br>

>>>> to translate later.<br>

>>> Well, both honestly.  But mangling with a non-target specific scheme is<br>

>>> a lot better, so I might be okay with that.   Good idea.<br>

>><br>

>> I liked your idea of directly encoding the signature in the metadata, <br>

>> but I think that we want to continue to use attributes, and not <br>

>> metadata, and the options for attributes seem more limited - unless we <br>

>> allow attributes to take metadata arguments - maybe that's an <br>

>> enhancement worth considering.<br>

> I recently talked to people in the OpenMP language committee meeting<br>

> about this and, thinking forward to the actual implementation/use of the<br>

> OpenMP 5.x declare variant feature, I'd say:<br>

><br>

>   - We will need a mangling scheme if we want to allow variants on<br>

>     declarations that are defined elsewhere.<br>

>   - We will need a (OpenMP) standardized mangling scheme if we want<br>

>     interoperability between compilers.<br>

><br>

> I assume we want both so I think we will need both.<br>

If I'm reading this correctly, this describes a need for the frontend to<br>

have a mangling scheme.  Nothing in here would seem to prevent the<br>

frontend for generating a declaration for a mangled external symbol and<br>

then referencing that declaration.  Am I missing something?<br>

><br>

> That said, I think this should allow us to avoid attributes/metadata<br>

> which seems to me like a good thing right now.<br>

><br>

> Cheers,<br>

>   Johannes<br>

><br>

><br>

>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:<br>

>>>>>> Dear all,<br>

>>>>>><br>

>>>>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions.<br>

>>>>>><br>

>>>>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see

<a href="http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html">http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html</a>). The previous RFC is to be considered abandoned.<br>

>>>>>><br>

>>>>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower

 scope limits the impact of changes that are needed in both clang and LLVM.<br>

>>>>>><br>

>>>>>> Please let me know what you think.<br>

>>>>>><br>

>>>>>> Kind regards,<br>

>>>>>><br>

>>>>>> Francesco<br>

>>>>>><br>

>>>>>><br>

>>>>>> =================================================================================<br>

>>>>>><br>

>>>>>> Introduction<br>

>>>>>> ============<br>

>>>>>><br>

>>>>>> This RFC encompasses the proposal of informing the vectorizer about the<br>

>>>>>> availability of vector functions provided by the user. The mechanism is<br>

>>>>>> based on the use of the directive `declare variant` introduced in OpenMP<br>

>>>>>> 5.0 [^1].<br>

>>>>>><br>

>>>>>> The mechanism proposed has the following properties:<br>

>>>>>><br>

>>>>>> 1.  Decouples the compiler front-end that knows about the availability<br>

>>>>>>       of vectorized routines, from the back-end that knows how to make use<br>

>>>>>>       of them.<br>

>>>>>> 2.  Enable support for a developer's own vector libraries without<br>

>>>>>>       requiring changes to the compiler.<br>

>>>>>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector function<br>

>>>>>>       mappings as relevant for their own runtime libraries, etc.<br>

>>>>>><br>

>>>>>> The implemetation consists of two separate sets of changes.<br>

>>>>>><br>

>>>>>> The first set is a set o changes in `llvm`, and consists of:<br>

>>>>>><br>

>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about the<br>

>>>>>>       availability of user-defined vector functions via metadata attached<br>

>>>>>>       to an `llvm::CallInst`.<br>

>>>>>> 2.  [An infrastructure](#infrastructure) that can be queried to retrive<br>

>>>>>>       information about the available vector functions associated to a<br>

>>>>>>       `llvm::CallInst`.<br>

>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to query the<br>

>>>>>>       metadata.<br>

>>>>>><br>

>>>>>> The second set consists of the changes [changes in clang](#clang) that<br>

>>>>>> are needed too to recognize the `#pragma clang declare variant`<br>

>>>>>> directive.<br>

>>>>>><br>

>>>>>> Proposed changes<br>

>>>>>> ================<br>

>>>>>><br>

>>>>>> We propose an implementation that uses `#pragma clang declare variant`<br>

>>>>>> to inform the backend components about the availability of vector<br>

>>>>>> version of scalar functions found in IR. The mechanism relies in storing<br>

>>>>>> such information in IR metadata, and therefore makes the<br>

>>>>>> auto-vectorization of function calls a mid-end (`opt`) process that is<br>

>>>>>> independent on the front-end that generated such IR metadata.<br>

>>>>>><br>

>>>>>> This implementation provides a generic mechanism that the users of the<br>

>>>>>> LLVM compiler will be able to use for interfacing their own vector<br>

>>>>>> routines for generic code.<br>

>>>>>><br>

>>>>>> The implementation can also expose vectorization-specific descriptors --<br>

>>>>>> for example, like the `linear` and `uniform` clauses of the OpenMP<br>

>>>>>> `declare simd` directive -- that could be used to finely tune the<br>

>>>>>> automatic vectorization of some functions (think for example the<br>

>>>>>> vectorization of `double sincos(double , double *, double *)`, where<br>

>>>>>> `linear` can be used to give extra information about the memory layout<br>

>>>>>> of the 2 pointers parameters in the vector version).<br>

>>>>>><br>

>>>>>> The directive `#pragma clang declare variant` follows the syntax of the<br>

>>>>>> `#pragma omp declare variant` directive of OpenMP.<br>

>>>>>><br>

>>>>>> We define the new directive in the `clang` namespace instead of using<br>

>>>>>> the `omp` one of OpenMP to allow the compiler to perform<br>

>>>>>> auto-vectorization outside of an OpenMP SIMD context.<br>

>>>>>><br>

>>>>>> The mechanism is base on OpenMP to provide a uniform user experience<br>

>>>>>> across the two mechanism, and to maximise the number of shared<br>

>>>>>> components of the infrastructure needed in the compiler frontend to<br>

>>>>>> enable the feature.<br>

>>>>>><br>

>>>>>> Changes in LLVM IR {#llvmIR}<br>

>>>>>> ------------------<br>

>>>>>><br>

>>>>>> The IR is enriched with metadata that details the availability of vector<br>

>>>>>> versions of an associated scalar function. This metadata is attached to<br>

>>>>>> the call site of the scalar function.<br>

>>>>>><br>

>>>>>> The metadata takes the form of an attribute containing a comma separated<br>

>>>>>> list of vector function mappings. Each entry has a unique name that<br>

>>>>>> follows the Vector Function ABI[^2] and real name that is used when<br>

>>>>>> generating calls to this vector function.<br>

>>>>>><br>

>>>>>>       vfunc_name1(real_name1), vfunc_name2(real_name2)<br>

>>>>>><br>

>>>>>> The Vector Function ABI name describes the signature of the vector<br>

>>>>>> function so that properties like vectorisation factor can be queried<br>

>>>>>> during compilation.<br>

>>>>>><br>

>>>>>> The `(real name)` token is optional and assumed to match the Vector<br>

>>>>>> Function ABI name when omitted.<br>

>>>>>><br>

>>>>>> For example, the availability of a 2-lane double precision `sin`<br>

>>>>>> function via SVML when targeting AVX on x86 is provided by the following<br>

>>>>>> IR.<br>

>>>>>><br>

>>>>>>       // ...<br>

>>>>>>       ... = call double @sin(double) #0<br>

>>>>>>       // ...<br>

>>>>>><br>

>>>>>>       #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),<br>

>>>>>>                                 _ZGVdN4v_sin(__svml_sin4),<br>

>>>>>>                                 ..."} }<br>

>>>>>><br>

>>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant<br>

>>>>>> attribute provides information on the shape of the vector function via<br>

>>>>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI<br>

>>>>>> for Intel, and remaps the standard Vector Function ABI name to the<br>

>>>>>> non-standard name `__svml_sin2`.<br>

>>>>>><br>

>>>>>> This metadata is compatible with the proposal "Proposal for function<br>

>>>>>> vectorization and loop vectorization with function calls",[^3] that uses<br>

>>>>>> Vector Function ABI mangled names to inform the vectorizer about the<br>

>>>>>> availability of vector functions. The proposal extends the original by<br>

>>>>>> allowing the explicit mapping of the Vector Function ABI mangled name to<br>

>>>>>> a non-standard name, which allows the use of existing vector libraries.<br>

>>>>>><br>

>>>>>> The `vector-variant` attribute needs to be attached on a per-call basis<br>

>>>>>> to avoid conflicts when merging modules with different vector variants.<br>

>>>>>><br>

>>>>>> The query infrastructure: SVFS {#infrastructure}<br>

>>>>>> ------------------------------<br>

>>>>>><br>

>>>>>> The Search Vector Function System (SVFS) is constructed from an<br>

>>>>>> `llvm::Module` instance so it can create function definitions. The SVFS<br>

>>>>>> exposes an API with two methods.<br>

>>>>>><br>

>>>>>> ### `SVFS::isFunctionVectorizable`<br>

>>>>>><br>

>>>>>> This method queries the avilability of a vectorized version of a<br>

>>>>>> function. The signature of the method is as follows.<br>

>>>>>><br>

>>>>>>       bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);<br>

>>>>>><br>

>>>>>> The method determine the availability of vector version of the function<br>

>>>>>> invoked by the `Call` parameter by looking at the `vector-variant`<br>

>>>>>> metadata.<br>

>>>>>><br>

>>>>>> The `Params` argument is a map that associates the position of a<br>

>>>>>> parameter in the `CallInst` to its `ParameterType` descriptor. The<br>

>>>>>> `ParameterType` descriptor holds information about the shape of the<br>

>>>>>> correspondend parameter in the signature of the vector function. This<br>

>>>>>> `ParamaterType` is used to query the SVMS about the availability of<br>

>>>>>> vector version that have `linear`, `uniform` or `align` parameters (in<br>

>>>>>> the sense of OpenMP 4.0 and onwards).<br>

>>>>>><br>

>>>>>> The method `isFunctionVectorizable`, when invoked with an empty<br>

>>>>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method<br>

>>>>>> `isFunctionVectorizable(StrinRef Name)`.<br>

>>>>>><br>

>>>>>> ### `SVFS::getVectorizedFunction`<br>

>>>>>><br>

>>>>>> This method returns the vector function declaration that correspond to<br>

>>>>>> the needs of the vectorization technique that is being run.<br>

>>>>>><br>

>>>>>> The signature of the function is as follows.<br>

>>>>>><br>

>>>>>>       std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(<br>

>>>>>>         llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);<br>

>>>>>><br>

>>>>>> The `Call` parameter is the call instance that is being vectorized, the<br>

>>>>>> `VF` parameter represent the vectorization factor (how many lanes), the<br>

>>>>>> `IsMasked` parameter decides whether or not the signature of the vector<br>

>>>>>> function is required to have a mask parameter, the `Params` parameter<br>

>>>>>> describes the shape of the vector function as in the<br>

>>>>>> `isFunctionVectorizable` method.<br>

>>>>>><br>

>>>>>> The methods uses the `vector-variant` metadata and returns the function<br>

>>>>>> signature and the name of the function based on the input parameters.<br>

>>>>>><br>

>>>>>> The SVFS can add new function definitions, in the same module as the<br>

>>>>>> `Call`, to provide vector functions that are not present within the<br>

>>>>>> vector-variant metadata. For example, if a library provides a vector<br>

>>>>>> version of a function with a vectorization factor of 2, but the<br>

>>>>>> vectorizer is requesting a vectorization factor of 4, the SVFS is<br>

>>>>>> allowed to create a definition that calls the 2-lane version twice. This<br>

>>>>>> capability applies similarly for providing masked and unmasked versions<br>

>>>>>> when the request does not match what is available in the library.<br>

>>>>>><br>

>>>>>> This method is equivalent to the TLI method<br>

>>>>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.<br>

>>>>>><br>

>>>>>> Notice that to fully support OpenMP vectorization we need to think about<br>

>>>>>> a fuzzy matching mechanism that is able to select a candidate in the<br>

>>>>>> calling context. However, this proposal is intended for scalar-to-vector<br>

>>>>>> mappings of math-like functions that are most likely to associate a<br>

>>>>>> unique vector candidate in most contexts. Therefore, extending this<br>

>>>>>> behavior to a generic one is an aspect of the implementation that will<br>

>>>>>> be treated in a separate RFC about the vectorization pass.<br>

>>>>>><br>

>>>>>> ### Scalable vectorization<br>

>>>>>><br>

>>>>>> Both methods of the SVFS API will be extended with a boolean parameter<br>

>>>>>> to specify whether scalable signatures are needed by the user of the<br>

>>>>>> SVFS.<br>

>>>>>><br>

>>>>>> Changes in clang {#clang}<br>

>>>>>> ----------------<br>

>>>>>><br>

>>>>>> We use clang to generate the metadata described above.<br>

>>>>>><br>

>>>>>> In the compilation unit, the vector function definition or declaration<br>

>>>>>> must be visible and associated to the scalar version via the<br>

>>>>>> `#pragma clang declare variant` according to the rule defined by the<br>

>>>>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in<br>

>>>>>> the following example.<br>

>>>>>><br>

>>>>>>       #pragma clang declare variant(vector_sinf) \<br>

>>>>>>       match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})<br>

>>>>>>       extern float sinf(float);<br>

>>>>>><br>

>>>>>>       float32x4_t vector_sinf(float32x4_t x);<br>

>>>>>><br>

>>>>>> The `construct` set in the directive, together with the `device` set, is<br>

>>>>>> used to generate the vector mangled name to be used in the<br>

>>>>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting<br>

>>>>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of<br>

>>>>>> the scalar function in the vector name are defined in the the Vector<br>

>>>>>> Function ABI specification of the target.<br>

>>>>>><br>

>>>>>> The part of the vector-variant attribute that redirects the call to<br>

>>>>>> `vector_sinf` is derived from the `variant-id` specified in the<br>

>>>>>> `variant` clause.<br>

>>>>>><br>

>>>>>> Summary<br>

>>>>>> =======<br>

>>>>>><br>

>>>>>> New `clang` directive in clang<br>

>>>>>> ------------------------------<br>

>>>>>><br>

>>>>>> `#pragma omp declare variant`, same as `#pragma omp declare variant`<br>

>>>>>> restricted to the `simd` context selector, from OpenMP 5.0+.<br>

>>>>>><br>

>>>>>> Option behavior, and interaction with OpenMP<br>

>>>>>> --------------------------------------------<br>

>>>>>><br>

>>>>>> The behavior described below makes sure that<br>

>>>>>> `#pragma cland declare variant` function vectorization and OpenMP<br>

>>>>>> function vectorization are orthogonal.<br>

>>>>>><br>

>>>>>> `-fclang-declare-variant`<br>

>>>>>><br>

>>>>>> :   The `#pragma clang declare variant` directives are parsed and used<br>

>>>>>>       to populate the `vector-variant` attribute.<br>

>>>>>><br>

>>>>>> `-fopenmp[-simd]`<br>

>>>>>><br>

>>>>>> :   The `#pragma omp declare variant` directives are parsed and used to<br>

>>>>>>       populate the `vector-variant` attribute.<br>

>>>>>><br>

>>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`<br>

>>>>>><br>

>>>>>> :   The directive `#pragma omp declare variant` is used to populate the<br>

>>>>>>       `vector-variant` attribute in IR. The directive<br>

>>>>>>       `#pragma   clang declare variant` are ignored.<br>

>>>>>><br>

>>>>>> [^1]: <<a href="https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf">https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf</a>><br>

>>>>>><br>

>>>>>> [^2]: Vector Function ABI for x86:<br>

>>>>>>       <<a href="https://software.intel.com/en-us/articles/vector-simd-function-abi">https://software.intel.com/en-us/articles/vector-simd-function-abi</a>>.<br>

>>>>>>       Vector Function ABI for AArch64:<br>

>>>>>>       <a href="https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi">

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi</a><br>

>>>>>><br>

>>>>>> [^3]: <<a href="http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html">http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html</a>><br>

>>>>>><br>

>>>>>> _______________________________________________<br>

>>>>>> LLVM Developers mailing list<br>

>>>>>> llvm-dev@lists.llvm.org<br>

>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>>>> _______________________________________________<br>

>>>>> cfe-dev mailing list<br>

>>>>> cfe-dev@lists.llvm.org<br>

>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

>> -- <br>

>> Hal Finkel<br>

>> Lead, Compiler Technology and Programming Languages<br>

>> Leadership Computing Facility<br>

>> Argonne National Laboratory<br>

>><br>

>> _______________________________________________<br>

>> cfe-dev mailing list<br>

>> cfe-dev@lists.llvm.org<br>

>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</div>

</span></font>

</body>

</html>