[cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for auto-vectorization.

Fri May 31 16:43:37 PDT 2019

>Is this also the case if the user did require lock-step semantic for the code to be correct?

Certainly not, but that part is actually beyond OpenMP specification.  I suggest looking up ICC's "#pragma simd assert" description and see if the assert feature is something you may be interested in seeing as an extended part of LLVM implementation of OpenMP (declare) simd. Else, vectorization report would tell you whether it was vectorized or not.

>How does OpenCL/SYCL play in this now?

Not right now, when we are working to get OpenMP stuff going --- except that I don't think we need to change the design (e.g., on function attribute, VecClone direction, etc.) in the future for those or similar languages.

-----Original Message-----
From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
Sent: Friday, May 31, 2019 4:16 PM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Philip Reames <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>; LLVM Development List <llvm-dev at lists.llvm.org>; nd <nd at arm.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 05/31, Saito, Hideki wrote:
> 
> >VectorClone does more than just mapping a scalar version to a vector one. It builds also the vector version definition by auto-vectorizing the body of the scalar function.
> [...]
> The code is still fully functional w/o LoopVectorize vectorizing that loop.

Is this also the case if the user did require lock-step semantic for the code to be correct?

> >I don’t know if the patches related to VecClone also are intended to use the `vector-variant` attribute for function declaration with a #pragma omp declare simd.
> 
> VecClone predated #pragma omp declare variant. So that patches doesn’t 
> know about declare variant. VecClone was written for handling #pragma omp declare simd, as described above. OpenCL/SYCL kernel is similar enough to OpenMP declare simd. Most code can be reused.

How does OpenCL/SYCL play in this now?

> -----Original Message-----
> From: Francesco Petrogalli [mailto:Francesco.Petrogalli at arm.com]
> Sent: Friday, May 31, 2019 3:06 PM
> To: Doerfert, Johannes <jdoerfert at anl.gov>
> Cc: Philip Reames <listmail at philipreames.com>; Finkel, Hal J. 
> <hfinkel at anl.gov>; LLVM Development List <llvm-dev at lists.llvm.org>; nd 
> <nd at arm.com>; Saito, Hideki <hideki.saito at intel.com>; Clang Dev 
> <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for auto-vectorization.
> 
> 
> 
> > On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
> > 
> > I think I did misunderstand what you want to do with attributes. 
> > This is my bad. Let me try to explain:
> > 
> > It seems you want the "vector-variants" attributes (which I could 
> > not find with this name in trunk, correct?) to "remember" what 
> > vector versions can be created (wrt. validity), assuming a 
> > definition is available? Correct?
> 
> Yes.
> 
> > What I was concerned with is the example I sketched somewhere below 
> > which motivates the need for a generalized/standardized name 
> > mangling for OpenMP. I though you wanted to avoid that somehow but 
> > if you don't I misunderstood you. I basically removed the part where 
> > the vector versions have to be created first but I assumed them to 
> > be existent (in the module or somewhere else). That is, I assumed a 
> > call to foo and various symbols available that are specializations 
> > of foo. When we then vectorize foo (or otherwise specialize at some 
> > point in the future), you would scan the module and pick the best 
> > match based on the context of the call.
> > 
> 
> Yes, although the syntax you use below is wrong. Declare variant is attached to the scalar definition, and points to a vector definitions (the variant) that is declared/defined in the same compilation unit where the scalar version is visible.
> 
> 
> > Now I don't know if I understood your proposal by now but let me ask 
> > a question anyway:
> > 
> > VecClone.cpp:276-278 mentions that the vectorizer is supposed to 
> > look at the vector-variants functions. This works for variants that 
> > are created from definitions in the module but what about #omp 
> > declare simd declarations?
> > 
> 
> VectorClone does more than just mapping a scalar version to a vector one. It builds also the vector version definition by auto-vectorizing the body of the scalar function.
> 
> I don’t know if the patches related to VecClone also are intended to use the `vector-variant` attribute for function declaration with a #pragma omp declare simd. On aarch64, in Arm compiler for HPC, we do that to support vector math libraries. It works in principle, but `vector variant` allows more context selection (and custom names instead of vector ABI names, which are easier for users).
> 
> 
> > 
> > On 05/31, Francesco Petrogalli wrote:
> >>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
> >>> 
> >>> I think we should split this discussion:
> >>> TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP 5.X
> >>>                  features, including compatibility with other
> >>>                  compilers and cross module support.
> >> 
> >> Yes, and we have to carefully make this as standard and compatible as possible.
> > 
> > Agreed.
> > 
> > 
> >>> TOPIC 3b & 5: Interoperability with clang declare (system vs. user
> >>>                declares)
> >> 
> >> 
> >> I think that Alexey explanation of how the directive are handled 
> >> internally in the frontend makes us propound towards the attribute.
> > 
> > How things are handled right now, especially given that declare 
> > variant is not handled at all, should not limit our design space. If 
> > the argument is that we cannot reasonably implement a solution, that 
> > is a different story.
> > 
> > 
> >>> TOPIC 3a & 3c: floating point issues?
> >>> 
> >> 
> >> I believe there is no issue there. I have quoted the openMP standard in reply to Renato:
> >> 
> >> See https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 118, lines 23-24:
> >> 
> >> “The execution of the function or subroutine cannot have any side 
> >> effects that would alter its execution for concurrent iterations of 
> >> a SIMD chunk."
> > 
> > Great.
> > 
> > 
> >>> I inlined comments for Topic 1 below.
> >>> 
> >>> I hope that we do not have to discuss topic 2 if we agree neither 
> >>> attributes nor metadata is necessary, or better, will solve the 
> >>> actual problem at hand. I don't have strong feeling on topic 4 but 
> >>> I have the feeling this will become less problematic once we figure out topic 1.
> >>> 
> >>> Thanks,
> >>> Johannes
> >>> 
> >>> 
> >>> On 05/31, Francesco Petrogalli wrote:
> >>>> # TOPIC 1: concerns about name mangling
> >>>> 
> >>>> I understand that there are concerns in using the mangling scheme 
> >>>> I proposed, and that it would be preferred to have a mangling 
> >>>> scheme that is based on (and standardized by) OpenMP.
> >>> 
> >>> I still think it will be required to have a standardized one, not 
> >>> only preferred.
> >>> 
> >>> 
> >> 
> >> I am all with you in standardizing. x86 and arch64 have their own 
> >> vector function ABI, which, although “private”, are to be 
> >> considered standard. Opensource and commercial compilers are using 
> >> them, therefore we have to deal with this mangling scheme, whether 
> >> or not OpenMP comes up with a standard mangling scheme.
> > 
> > I don't get the point you are trying to make here. What do you mean 
> > by "we have to deal with"? (I do not suggest to get rid of them.)
> > 
> 
> That we cannot ignore the fact that the name scheme is already standardized by the vendors, so let’s first deal with what we have, and think about the OpenMP mangling scheme only once there is one available.
> 
> > 
> >>>> I hear the argument on having some common ground here. In fact, 
> >>>> there is already common ground between the x86 and aarch64 
> >>>> backend, who have based their respective Vector Function ABI specifications on OpenMP.
> >>>> 
> >>>> In fact, the mangled name grammar can be summarized as follows:
> >>>> 
> >>>> _ZGV<isa><masking><VLEN><parameter type>_<scalar name>
> >>>> 
> >>>> Across vector extensions the only <token> that will differ is the 
> >>>> <isa> token.
> >>>> 
> >>>> This might lead people to think that we could drop the _ZGV<isa> 
> >>>> prefix and consider the <masking><VLEN><parameter type>_<scalar
> >>>> name> part as a sort of unofficial OpenMP mangling scheme: in 
> >>>> name> fact,
> >>>> the signature of an “unmasked 2-lane vector vector of `sin`” will 
> >>>> always be `<2 x double>(2 x double>).
> >>>> 
> >>>> The problem with this choice is the number of vector version 
> >>>> available for a target is not unique.
> >>> 
> >>> For me, this simply means this mangling scheme is not sufficient.
> >>> 
> >> 
> >> Can you explain more why you think the mangling scheme is not 
> >> sufficient? The mangling scheme is shaped to provide all the 
> >> information that the OpenMP directive describes.
> > 
> > I don't know if it is insufficient but I though you hinted towards that.
> 
> I didn’t mean that, the tokens in the vector function ABI mangled schemes are sufficient.
> 
> > If we can handle/decode everything we need for declare variants then 
> > I do not object at all. If not, we require respective extension such 
> > that we can. The result should be a superset of the current SIMD 
> > encoding and compatible with the current one.
> > 
> > 
> 
> We can handle/decode everything for a SIMD context. :)
> 
> 
> > 
> >> The fact that x86 and aarch64 realize such information in different 
> >> way (multiple signature/vector extensions) is something that cannot 
> >> be avoided, because it is related to architectural aspects that are 
> >> specific to the vector extension and transparent to the OpenMP 
> >> standard.
> > 
> > I don't think that is a problem (that's why I "failed to see the 
> > problem" in the comment below). I look at it this way: If #declare 
> > simd, or similar, results in N variants, it should at the end of the 
> > day not be different from declaring these N variants explicitly with 
> > the respective declare variant match clause.
> > 
> 
> That’s not the case. #declare simd should create all the versions that are optimal for the target. We carefully thoght about that when writing the vector function ABI. Most of the constrains derive by the fact that each target has a specific register size. 
> 
> Example:
> 
> #pragma omp declare simd
> Float foo(float);
> 
> X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see 
> https://godbolt.org/z/m1BUVt Arm NEON: -> 4 versions {2, 4 lanes} x 
> {masking, no masking } Arm SVE: -> 1 version
> 
> Therefore, the outcome of declare simd is not target independent. Your expectation are met only inside one target.
> 
> 
> > 
> >>>> In particular, the following declaration generates multiple 
> >>>> vector versions, depending on the target:
> >>>> 
> >>>> #pragma omp declare simd simdlen(2) notinbranch double
> >>>> foo(double) {…};
> >>>> 
> >>>> On x86, this generates at least 4 symbols (one for SSE, one for 
> >>>> AVX, one for AVX2, and one for AVX512:
> >>>> https://godbolt.org/z/TLYXPi)
> >>>> 
> >>>> On aarch64, the same declaration generates a unique symbol, as 
> >>>> specified in the Vector Function ABI.
> >>> 
> >>> I fail to see the problem. We generate X symbols for X different 
> >>> contexts. Once we get to the point where we vectorize, we 
> >>> determine which context fits best and choose the corresponding symbol version.
> >>> 
> >> 
> >> Yes, this is exactly what we need to do, under the constrains that 
> >> the rules for  generating "X symbols for X different contexts” are 
> >> decided by the Vector Function ABI of the target.
> > 
> > Sounds good. The vector ABI is used to determine what contexts 
> > exists and what symbols should be created. I would assume the 
> > encoding should be the same as if we specified the versions
> > (/contexts) ourselves via #declare variant.
> > 
> 
> Oh yes, vector functions listed in a declare variant should obey the vector function ABI rules (other than the function name).
> 
> > 
> >>> Maybe my view is to naive here, please feel free to correct me.
> >>> 
> >>> 
> >>>> This means that the attribute (or metadata) that carries the 
> >>>> information on the available vector version needs to deal also 
> >>>> with things that are not usually visible at IR level, but that 
> >>>> might still need to be provided to be able to decide which 
> >>>> particular instruction set/ vector extension needs to be targeted.
> >>> 
> >>> The symbol names should carry all the information we need. If they 
> >>> do not, we need to improve the mangling scheme such that they do.
> >>> There is no attributes/metadata we could use at library boundaries.
> >>> 
> >> Hum, I am not sure what you mean by "There is no 
> >> attributes/metadata we could use at library boundaries."
> > 
> > (This seems to be part of the misunderstanding, I leave my comment 
> > here
> > anyway:)
> > 
> > The simd-related stuff works because it is a uniform mangling scheme 
> > used by all compilers. Take the situation below in which I think we 
> > want to call foo_CTX in the library. If so, we need a name for it.
> > 
> 
> In the situation below, the mangled name is going to be the same for both compilers, as long as they adhere to the vector function ABI.
> 
> > 
> > a.c:  // Compiled by gcc into a library #omp declare variant (foo)
> > match(CTX) void foo_CTX(...) {...}
> > 
> > b.c:  // Compiled by clang linked against the library above.
> > #omp declare variant (foo) match(CTX) void foo_CTX(...);
> > 
> > void bar(...) {
> >  #pragma omp CTX
> >  foo();   // <- What function (symbol) do we call if a.c was compiled
> >           //    by gcc and b.c with clang?
> > }
> > 
> 
> Please notice that `declare variant` needs to be attached to the scalar function, not the vector one.
> 
> ```
> #pragma omp declare variant(foo_CTX) match (context=simd… double foo
> (double) {…}
> 
> Vector_double_ty foo_CTX(vector_double_ty) {…} ```
> 
> In vectorizing foo in bar, the compiler will not care where foo_CTX would come from (of course, as long as the scalar+declare variant declarations are visible).
> 
> >> In our downstream compiler (Arm compiler for HPC, based on LLVM), 
> >> we use `declare simd` to provide vector math functions via custom 
> >> header file. It works brilliantly, if not for specific aspects that 
> >> would be perfectly covered by the `declare variant`, which might be 
> >> one of the reason why the OpenMP committee decided to introduce 
> >> `declare variant`.
> > 
> > But you (assume that you) control the mangling scheme across the 
> > entire infrastructure. Given that the simd mangling is de-facto 
> > standardized, that works.
> > 
> > Side note:
> > Declare variant, as of 5.0, is not flexible enough for a sensible 
> > inclusion of target specific headers. That will change in 5.1.
> > 
> 
> Could you point me at the discussion in 5.1 on this specific aspect?
> 
> 
> > 
> >> If your concerns is that by adding an attribute that somehow 
> >> represent something that is available in an external library is not 
> >> enough to guarantee that that symbol is available in the library… 
> >> not even C code can guarantee that? If the linker is not pointing 
> >> to the right library, there is nothing that can prevent it to fail 
> >> if the symbol is not present?
> > 
> > I don't follow the example you describe. I don't want to change 
> > anything in how symbols are looked up or what happens if they are missing.
> > 
> > 
> 
> I don’t want to change that too :). I think we are misunderstanding each other here...
> 
> >>>> I used an example based on `declare simd` instead of `declare 
> >>>> variant` because the attribute/metadata needed for `declare 
> >>>> variant` is a modification of the one needed for `declare simd`, 
> >>>> which has already been agreed in a previous RFC proposed by Intel 
> >>>> [1], and for which Intel has already provided an implementation 
> >>>> [2]. The changes proposed in this RFC are fully compatible with 
> >>>> the work that is being don for the VecClone pass in [2].
> >>>> 
> >>>> [1]
> >>>> http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> >>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
> >>> 
> >>> Having an agreed upon mangling for the older feature is not 
> >>> necessarily important here. We will need more functionality for 
> >>> variants and keeping the old scheme around with some metadata is 
> >>> not an extensible long-term solution. So, I would not try to fit 
> >>> variants into the existing simd-scheme but instead do it the other 
> >>> way around. We define what we need for variants and implement simd in that scheme.
> >>> 
> >> 
> >> I kinda think that having agreed on something is important. It 
> >> allows to build other things on top of what have been agreed 
> >> without breaking compatibility.
> >> 
> >> On the specific, which are the new functionalities needed for the 
> >> variants that would make the current metadata (attributes) for 
> >> declare simd non extensible?
> > 
> > See first comment.
> > 
> >>>> The good news is that as far as AArch64 and x86 are concerned, the only thing that will differ in the mangled name is the “<isa>” token. As far as I can tell, the mangling scheme of the rest of the vector name is the same, therefore a lot of infrastructure in terms of mangling and demangling can be reused. In fact, the `mangleVectorParameters` function in https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could already be shared among x86 and aarch64.
> >>>> 
> >>>> TOPIC 2: metadata vs attribute
> >>>> 
> >>>> From a functionality point of view, I don’t care whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the following:
> >>>> 
> >>>> attributes #0 = { nounwind uwtable 
> >>>> “vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_
> >>>> ve
> >>>> c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVe
> >>>> M1
> >>>> 6vv_vec_sum,_ZGVeN16”}
> >>>> 
> >>>> This is an attribute (I though it was metadata?), I am happy to reword the RFC using the right terminology (sorry for messing this up).
> >>>> 
> >>>> Also, @Renato expressed concern that metadata might be dropped by optimization passes - would using attributes prevent that?
> >>>> 
> >>>> TOPIC 3: "There is no way to notify the backend how conformant the SIMD versions are.”
> >>>> 
> >>>> @Shawn, I am afraid I don’t understand what you mean by “conformant” here. Can you elaborate with an example?
> >>>> 
> >>>> TOPIC 3: interaction of the `omp declare variant` with `clang 
> >>>> declare variant`
> >>>> 
> >>>> I believe this is described in the `Option behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based one orthogonal. Of course, we might decide to make -fclang-declare-variant on/off by default, and have default behavior when interacting with -fopenmp-simd. For the sake of compatibility with other compilers, we might need to require -fno-clang-declare-variant when targeting -fopenmp-[simd].
> >>>> 
> >>>> TOPIC 3: "there are no special arguments / flags / status regs that are used / changed in the vector version that the compiler will have to "just know”
> >>>> 
> >>>> I believe that this concern is raised by the problem of handling FP exceptions? If that’s the case, the compiler is not allowed to do any assumption on the vector function about that, and treat it with the same knowledge of any other function, depending on the visibility it has in the compilation unit. @Renato, does this answer your question?
> >>>> 
> >>>> TOPIC 4: attribute in function declaration vs attribute function 
> >>>> call site
> >>>> 
> >>>> We discussed this in the previous version of the proposal. Having it in the call sites guarantees that incompatible vector version are used when merging modules compiled for different targets. I don’t have a use case for this, if I remember correctly this was asked by @Hideki Saito. Hideki, any comment on this?
> >>>> 
> >>>> TOPIC 5: overriding system header (the discussion on #pragma omp/clang/system variants initiated by @Hal Finkel).
> >>>> 
> >>>> I though that the split among #pragma clang declare variant and #pragma omp declare variant was already providing the orthogonality between system header and user header. Meaning that a user should always prefer the omp version (for portability to other compilers) instead of the #pragma clang one, which would be relegated to system headers and headers provided by the compiler. Am I missing something? If so, I am happy to add a “system” version of the directive, as it would be quite easy to do given most of the parsing infrastructure will be shared.
> >>>> 
> >>>> 
> >>>>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at philipreames.com> wrote:
> >>>>> 
> >>>>> 
> >>>>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> >>>>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> >>>>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
> >>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >>>>>>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
> >>>>>>>>>> I generally like the idea of having support in IR for 
> >>>>>>>>>> vectorization of custom functions.  I have several use cases which would benefit from this.
> >>>>>>>>>> 
> >>>>>>>>>> I'd suggest a couple of reframings to the IR representation though.
> >>>>>>>>>> 
> >>>>>>>>>> First, this should probably be specified as 
> >>>>>>>>>> metadata/attribute on a function declaration.  Allowing the 
> >>>>>>>>>> callsite variant is fine, but it should primarily be a 
> >>>>>>>>>> property of the called function, not of the call site.  Being able to specify it once per declaration is much cleaner.
> >>>>>>>>> I agree. We should support this both on the function 
> >>>>>>>>> declaration and on the call sites.
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> Second, I really don't like the mangling use here.  We need 
> >>>>>>>>>> a better way to specify the properties of the function then 
> >>>>>>>>>> it's mangled name.  One thought to explore is to directly 
> >>>>>>>>>> use the Value of the function declaration (since this is 
> >>>>>>>>>> metadata and we can do that), and then tie the properties 
> >>>>>>>>>> to the function declaration in some way?  Sorry, I don't really have a specific suggestion here.
> >>>>>>>>> Is the problem the mangling or the fact that the mangling is 
> >>>>>>>>> ABI/target-specific? One option is to use LLVM's mangling 
> >>>>>>>>> scheme (the one we use for intrinsics) and then provide some 
> >>>>>>>>> backend infrastructure to translate later.
> >>>>>>>> Well, both honestly.  But mangling with a non-target specific scheme is
> >>>>>>>> a lot better, so I might be okay with that.   Good idea.
> >>>>>>> 
> >>>>>>> I liked your idea of directly encoding the signature in the 
> >>>>>>> metadata, but I think that we want to continue to use 
> >>>>>>> attributes, and not metadata, and the options for attributes 
> >>>>>>> seem more limited - unless we allow attributes to take 
> >>>>>>> metadata arguments - maybe that's an enhancement worth considering.
> >>>>>> I recently talked to people in the OpenMP language committee 
> >>>>>> meeting about this and, thinking forward to the actual 
> >>>>>> implementation/use of the OpenMP 5.x declare variant feature, I'd say:
> >>>>>> 
> >>>>>> - We will need a mangling scheme if we want to allow variants 
> >>>>>> on declarations that are defined elsewhere.
> >>>>>> - We will need a (OpenMP) standardized mangling scheme if we 
> >>>>>> want interoperability between compilers.
> >>>>>> 
> >>>>>> I assume we want both so I think we will need both.
> >>>>> If I'm reading this correctly, this describes a need for the 
> >>>>> frontend to have a mangling scheme.  Nothing in here would seem 
> >>>>> to prevent the frontend for generating a declaration for a 
> >>>>> mangled external symbol and then referencing that declaration.  Am I missing something?
> >>>>>> 
> >>>>>> That said, I think this should allow us to avoid 
> >>>>>> attributes/metadata which seems to me like a good thing right now.
> >>>>>> 
> >>>>>> Cheers,
> >>>>>> Johannes
> >>>>>> 
> >>>>>> 
> >>>>>>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:
> >>>>>>>>>>> Dear all,
> >>>>>>>>>>> 
> >>>>>>>>>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions.
> >>>>>>>>>>> 
> >>>>>>>>>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned.
> >>>>>>>>>>> 
> >>>>>>>>>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM.
> >>>>>>>>>>> 
> >>>>>>>>>>> Please let me know what you think.
> >>>>>>>>>>> 
> >>>>>>>>>>> Kind regards,
> >>>>>>>>>>> 
> >>>>>>>>>>> Francesco
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> ==========================================================
> >>>>>>>>>>> ==
> >>>>>>>>>>> =====================
> >>>>>>>>>>> 
> >>>>>>>>>>> Introduction
> >>>>>>>>>>> ============
> >>>>>>>>>>> 
> >>>>>>>>>>> This RFC encompasses the proposal of informing the 
> >>>>>>>>>>> vectorizer about the availability of vector functions 
> >>>>>>>>>>> provided by the user. The mechanism is based on the use of 
> >>>>>>>>>>> the directive `declare variant` introduced in OpenMP
> >>>>>>>>>>> 5.0 [^1].
> >>>>>>>>>>> 
> >>>>>>>>>>> The mechanism proposed has the following properties:
> >>>>>>>>>>> 
> >>>>>>>>>>> 1.  Decouples the compiler front-end that knows about the availability
> >>>>>>>>>>>    of vectorized routines, from the back-end that knows how to make use
> >>>>>>>>>>>    of them.
> >>>>>>>>>>> 2.  Enable support for a developer's own vector libraries without
> >>>>>>>>>>>    requiring changes to the compiler.
> >>>>>>>>>>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector function
> >>>>>>>>>>>    mappings as relevant for their own runtime libraries, etc.
> >>>>>>>>>>> 
> >>>>>>>>>>> The implemetation consists of two separate sets of changes.
> >>>>>>>>>>> 
> >>>>>>>>>>> The first set is a set o changes in `llvm`, and consists of:
> >>>>>>>>>>> 
> >>>>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about the
> >>>>>>>>>>>    availability of user-defined vector functions via metadata attached
> >>>>>>>>>>>    to an `llvm::CallInst`.
> >>>>>>>>>>> 2.  [An infrastructure](#infrastructure) that can be queried to retrive
> >>>>>>>>>>>    information about the available vector functions associated to a
> >>>>>>>>>>>    `llvm::CallInst`.
> >>>>>>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to query the
> >>>>>>>>>>>    metadata.
> >>>>>>>>>>> 
> >>>>>>>>>>> The second set consists of the changes [changes in
> >>>>>>>>>>> clang](#clang) that are needed too to recognize the 
> >>>>>>>>>>> `#pragma clang declare variant` directive.
> >>>>>>>>>>> 
> >>>>>>>>>>> Proposed changes
> >>>>>>>>>>> ================
> >>>>>>>>>>> 
> >>>>>>>>>>> We propose an implementation that uses `#pragma clang 
> >>>>>>>>>>> declare variant` to inform the backend components about 
> >>>>>>>>>>> the availability of vector version of scalar functions 
> >>>>>>>>>>> found in IR. The mechanism relies in storing such 
> >>>>>>>>>>> information in IR metadata, and therefore makes the 
> >>>>>>>>>>> auto-vectorization of function calls a mid-end (`opt`) process that is independent on the front-end that generated such IR metadata.
> >>>>>>>>>>> 
> >>>>>>>>>>> This implementation provides a generic mechanism that the 
> >>>>>>>>>>> users of the LLVM compiler will be able to use for 
> >>>>>>>>>>> interfacing their own vector routines for generic code.
> >>>>>>>>>>> 
> >>>>>>>>>>> The implementation can also expose vectorization-specific 
> >>>>>>>>>>> descriptors -- for example, like the `linear` and 
> >>>>>>>>>>> `uniform` clauses of the OpenMP `declare simd` directive
> >>>>>>>>>>> -- that could be used to finely tune the automatic 
> >>>>>>>>>>> vectorization of some functions (think for example the 
> >>>>>>>>>>> vectorization of `double sincos(double , double *, double 
> >>>>>>>>>>> *)`, where `linear` can be used to give extra information about the memory layout of the 2 pointers parameters in the vector version).
> >>>>>>>>>>> 
> >>>>>>>>>>> The directive `#pragma clang declare variant` follows the 
> >>>>>>>>>>> syntax of the `#pragma omp declare variant` directive of OpenMP.
> >>>>>>>>>>> 
> >>>>>>>>>>> We define the new directive in the `clang` namespace 
> >>>>>>>>>>> instead of using the `omp` one of OpenMP to allow the 
> >>>>>>>>>>> compiler to perform auto-vectorization outside of an OpenMP SIMD context.
> >>>>>>>>>>> 
> >>>>>>>>>>> The mechanism is base on OpenMP to provide a uniform user 
> >>>>>>>>>>> experience across the two mechanism, and to maximise the 
> >>>>>>>>>>> number of shared components of the infrastructure needed 
> >>>>>>>>>>> in the compiler frontend to enable the feature.
> >>>>>>>>>>> 
> >>>>>>>>>>> Changes in LLVM IR {#llvmIR}
> >>>>>>>>>>> ------------------
> >>>>>>>>>>> 
> >>>>>>>>>>> The IR is enriched with metadata that details the 
> >>>>>>>>>>> availability of vector versions of an associated scalar 
> >>>>>>>>>>> function. This metadata is attached to the call site of the scalar function.
> >>>>>>>>>>> 
> >>>>>>>>>>> The metadata takes the form of an attribute containing a 
> >>>>>>>>>>> comma separated list of vector function mappings. Each 
> >>>>>>>>>>> entry has a unique name that follows the Vector Function 
> >>>>>>>>>>> ABI[^2] and real name that is used when generating calls to this vector function.
> >>>>>>>>>>> 
> >>>>>>>>>>>    vfunc_name1(real_name1), vfunc_name2(real_name2)
> >>>>>>>>>>> 
> >>>>>>>>>>> The Vector Function ABI name describes the signature of 
> >>>>>>>>>>> the vector function so that properties like vectorisation 
> >>>>>>>>>>> factor can be queried during compilation.
> >>>>>>>>>>> 
> >>>>>>>>>>> The `(real name)` token is optional and assumed to match 
> >>>>>>>>>>> the Vector Function ABI name when omitted.
> >>>>>>>>>>> 
> >>>>>>>>>>> For example, the availability of a 2-lane double precision 
> >>>>>>>>>>> `sin` function via SVML when targeting AVX on x86 is 
> >>>>>>>>>>> provided by the following IR.
> >>>>>>>>>>> 
> >>>>>>>>>>>    // ...
> >>>>>>>>>>>    ... = call double @sin(double) #0
> >>>>>>>>>>>    // ...
> >>>>>>>>>>> 
> >>>>>>>>>>>    #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
> >>>>>>>>>>>                              _ZGVdN4v_sin(__svml_sin4),
> >>>>>>>>>>>                              ..."} }
> >>>>>>>>>>> 
> >>>>>>>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this 
> >>>>>>>>>>> vector-variant attribute provides information on the shape 
> >>>>>>>>>>> of the vector function via the string `_ZGVcN2v_sin`, 
> >>>>>>>>>>> mangled according to the Vector Function ABI for Intel, 
> >>>>>>>>>>> and remaps the standard Vector Function ABI name to the non-standard name `__svml_sin2`.
> >>>>>>>>>>> 
> >>>>>>>>>>> This metadata is compatible with the proposal "Proposal 
> >>>>>>>>>>> for function vectorization and loop vectorization with 
> >>>>>>>>>>> function calls",[^3] that uses Vector Function ABI mangled 
> >>>>>>>>>>> names to inform the vectorizer about the availability of 
> >>>>>>>>>>> vector functions. The proposal extends the original by 
> >>>>>>>>>>> allowing the explicit mapping of the Vector Function ABI mangled name to a non-standard name, which allows the use of existing vector libraries.
> >>>>>>>>>>> 
> >>>>>>>>>>> The `vector-variant` attribute needs to be attached on a 
> >>>>>>>>>>> per-call basis to avoid conflicts when merging modules with different vector variants.
> >>>>>>>>>>> 
> >>>>>>>>>>> The query infrastructure: SVFS {#infrastructure}
> >>>>>>>>>>> ------------------------------
> >>>>>>>>>>> 
> >>>>>>>>>>> The Search Vector Function System (SVFS) is constructed 
> >>>>>>>>>>> from an `llvm::Module` instance so it can create function 
> >>>>>>>>>>> definitions. The SVFS exposes an API with two methods.
> >>>>>>>>>>> 
> >>>>>>>>>>> ### `SVFS::isFunctionVectorizable`
> >>>>>>>>>>> 
> >>>>>>>>>>> This method queries the avilability of a vectorized 
> >>>>>>>>>>> version of a function. The signature of the method is as follows.
> >>>>>>>>>>> 
> >>>>>>>>>>>    bool isFunctionVectorizable(llvm::CallInst * Call, 
> >>>>>>>>>>> ParTypeMap Params);
> >>>>>>>>>>> 
> >>>>>>>>>>> The method determine the availability of vector version of 
> >>>>>>>>>>> the function invoked by the `Call` parameter by looking at 
> >>>>>>>>>>> the `vector-variant` metadata.
> >>>>>>>>>>> 
> >>>>>>>>>>> The `Params` argument is a map that associates the 
> >>>>>>>>>>> position of a parameter in the `CallInst` to its 
> >>>>>>>>>>> `ParameterType` descriptor. The `ParameterType` descriptor 
> >>>>>>>>>>> holds information about the shape of the correspondend 
> >>>>>>>>>>> parameter in the signature of the vector function. This 
> >>>>>>>>>>> `ParamaterType` is used to query the SVMS about the 
> >>>>>>>>>>> availability of vector version that have `linear`, `uniform` or `align` parameters (in the sense of OpenMP 4.0 and onwards).
> >>>>>>>>>>> 
> >>>>>>>>>>> The method `isFunctionVectorizable`, when invoked with an 
> >>>>>>>>>>> empty `ParTypeMap`, is equivalent to the 
> >>>>>>>>>>> `TargetLibraryInfo` method `isFunctionVectorizable(StrinRef Name)`.
> >>>>>>>>>>> 
> >>>>>>>>>>> ### `SVFS::getVectorizedFunction`
> >>>>>>>>>>> 
> >>>>>>>>>>> This method returns the vector function declaration that 
> >>>>>>>>>>> correspond to the needs of the vectorization technique that is being run.
> >>>>>>>>>>> 
> >>>>>>>>>>> The signature of the function is as follows.
> >>>>>>>>>>> 
> >>>>>>>>>>>    std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
> >>>>>>>>>>>      llvm::CallInst * Call, unsigned VF, bool IsMasked, 
> >>>>>>>>>>> ParTypeSet Params);
> >>>>>>>>>>> 
> >>>>>>>>>>> The `Call` parameter is the call instance that is being 
> >>>>>>>>>>> vectorized, the `VF` parameter represent the vectorization 
> >>>>>>>>>>> factor (how many lanes), the `IsMasked` parameter decides 
> >>>>>>>>>>> whether or not the signature of the vector function is 
> >>>>>>>>>>> required to have a mask parameter, the `Params` parameter 
> >>>>>>>>>>> describes the shape of the vector function as in the `isFunctionVectorizable` method.
> >>>>>>>>>>> 
> >>>>>>>>>>> The methods uses the `vector-variant` metadata and returns 
> >>>>>>>>>>> the function signature and the name of the function based on the input parameters.
> >>>>>>>>>>> 
> >>>>>>>>>>> The SVFS can add new function definitions, in the same 
> >>>>>>>>>>> module as the `Call`, to provide vector functions that are 
> >>>>>>>>>>> not present within the vector-variant metadata. For 
> >>>>>>>>>>> example, if a library provides a vector version of a 
> >>>>>>>>>>> function with a vectorization factor of 2, but the 
> >>>>>>>>>>> vectorizer is requesting a vectorization factor of 4, the 
> >>>>>>>>>>> SVFS is allowed to create a definition that calls the 
> >>>>>>>>>>> 2-lane version twice. This capability applies similarly for providing masked and unmasked versions when the request does not match what is available in the library.
> >>>>>>>>>>> 
> >>>>>>>>>>> This method is equivalent to the TLI method `StringRef 
> >>>>>>>>>>> getVectorizedFunction(StringRef F, unsigned VF) const;`.
> >>>>>>>>>>> 
> >>>>>>>>>>> Notice that to fully support OpenMP vectorization we need 
> >>>>>>>>>>> to think about a fuzzy matching mechanism that is able to 
> >>>>>>>>>>> select a candidate in the calling context. However, this 
> >>>>>>>>>>> proposal is intended for scalar-to-vector mappings of 
> >>>>>>>>>>> math-like functions that are most likely to associate a 
> >>>>>>>>>>> unique vector candidate in most contexts. Therefore, 
> >>>>>>>>>>> extending this behavior to a generic one is an aspect of the implementation that will be treated in a separate RFC about the vectorization pass.
> >>>>>>>>>>> 
> >>>>>>>>>>> ### Scalable vectorization
> >>>>>>>>>>> 
> >>>>>>>>>>> Both methods of the SVFS API will be extended with a 
> >>>>>>>>>>> boolean parameter to specify whether scalable signatures 
> >>>>>>>>>>> are needed by the user of the SVFS.
> >>>>>>>>>>> 
> >>>>>>>>>>> Changes in clang {#clang}
> >>>>>>>>>>> ----------------
> >>>>>>>>>>> 
> >>>>>>>>>>> We use clang to generate the metadata described above.
> >>>>>>>>>>> 
> >>>>>>>>>>> In the compilation unit, the vector function definition or 
> >>>>>>>>>>> declaration must be visible and associated to the scalar 
> >>>>>>>>>>> version via the `#pragma clang declare variant` according 
> >>>>>>>>>>> to the rule defined by the correspondent `#pragma omp 
> >>>>>>>>>>> declare variant` defined in OpenMP 5.0, as in the following example.
> >>>>>>>>>>> 
> >>>>>>>>>>>    #pragma clang declare variant(vector_sinf) \
> >>>>>>>>>>>    match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>>>>>>>>    extern float sinf(float);
> >>>>>>>>>>> 
> >>>>>>>>>>>    float32x4_t vector_sinf(float32x4_t x);
> >>>>>>>>>>> 
> >>>>>>>>>>> The `construct` set in the directive, together with the 
> >>>>>>>>>>> `device` set, is used to generate the vector mangled name 
> >>>>>>>>>>> to be used in the `vector-variant` attribute, for example 
> >>>>>>>>>>> `_ZGVnN2v_sin`, when targeting
> >>>>>>>>>>> AArch64 Advanced SIMD code generation. The rule for 
> >>>>>>>>>>> mangling the name of the scalar function in the vector 
> >>>>>>>>>>> name are defined in the the Vector Function ABI specification of the target.
> >>>>>>>>>>> 
> >>>>>>>>>>> The part of the vector-variant attribute that redirects 
> >>>>>>>>>>> the call to `vector_sinf` is derived from the `variant-id` 
> >>>>>>>>>>> specified in the `variant` clause.
> >>>>>>>>>>> 
> >>>>>>>>>>> Summary
> >>>>>>>>>>> =======
> >>>>>>>>>>> 
> >>>>>>>>>>> New `clang` directive in clang
> >>>>>>>>>>> ------------------------------
> >>>>>>>>>>> 
> >>>>>>>>>>> `#pragma omp declare variant`, same as `#pragma omp 
> >>>>>>>>>>> declare variant` restricted to the `simd` context selector, from OpenMP 5.0+.
> >>>>>>>>>>> 
> >>>>>>>>>>> Option behavior, and interaction with OpenMP
> >>>>>>>>>>> --------------------------------------------
> >>>>>>>>>>> 
> >>>>>>>>>>> The behavior described below makes sure that `#pragma 
> >>>>>>>>>>> cland declare variant` function vectorization and OpenMP 
> >>>>>>>>>>> function vectorization are orthogonal.
> >>>>>>>>>>> 
> >>>>>>>>>>> `-fclang-declare-variant`
> >>>>>>>>>>> 
> >>>>>>>>>>> :   The `#pragma clang declare variant` directives are parsed and used
> >>>>>>>>>>>    to populate the `vector-variant` attribute.
> >>>>>>>>>>> 
> >>>>>>>>>>> `-fopenmp[-simd]`
> >>>>>>>>>>> 
> >>>>>>>>>>> :   The `#pragma omp declare variant` directives are parsed and used to
> >>>>>>>>>>>    populate the `vector-variant` attribute.
> >>>>>>>>>>> 
> >>>>>>>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
> >>>>>>>>>>> 
> >>>>>>>>>>> :   The directive `#pragma omp declare variant` is used to populate the
> >>>>>>>>>>>    `vector-variant` attribute in IR. The directive
> >>>>>>>>>>>    `#pragma   clang declare variant` are ignored.
> >>>>>>>>>>> 
> >>>>>>>>>>> [^1]: 
> >>>>>>>>>>> <https://www.openmp.org/wp-content/uploads/OpenMP-API-Spec
> >>>>>>>>>>> if
> >>>>>>>>>>> ication-5.0.pdf>
> >>>>>>>>>>> 
> >>>>>>>>>>> [^2]: Vector Function ABI for x86:
> >>>>>>>>>>>    <https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>>>>>>>>    Vector Function ABI for AArch64:
> >>>>>>>>>>>    
> >>>>>>>>>>> https://developer.arm.com/products/software-development-to
> >>>>>>>>>>> ol s/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>>>>>>>> 
> >>>>>>>>>>> [^3]: 
> >>>>>>>>>>> <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732
> >>>>>>>>>>> .h
> >>>>>>>>>>> tml>
> >>>>>>>>>>> 
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> LLVM Developers mailing list llvm-dev at lists.llvm.org 
> >>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> cfe-dev mailing list
> >>>>>>>>>> cfe-dev at lists.llvm.org
> >>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>>>>> --
> >>>>>>> Hal Finkel
> >>>>>>> Lead, Compiler Technology and Programming Languages Leadership 
> >>>>>>> Computing Facility Argonne National Laboratory
> >>>>>>> 
> >>>>>>> _______________________________________________
> >>>>>>> cfe-dev mailing list
> >>>>>>> cfe-dev at lists.llvm.org
> >>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>> 
> >>> 
> >>> --
> >>> 
> >>> Johannes Doerfert
> >>> Researcher
> >>> 
> >>> Argonne National Laboratory
> >>> Lemont, IL 60439, USA
> >>> 
> >>> jdoerfert at anl.gov
> >> 
> > 
> > --
> > 
> > Johannes Doerfert
> > Researcher
> > 
> > Argonne National Laboratory
> > Lemont, IL 60439, USA
> > 
> > jdoerfert at anl.gov
> 

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov