[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Fri May 31 16:09:24 PDT 2019

On 05/31, Francesco Petrogalli wrote:
> 
> 
> > On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
> > 
> > I think I did misunderstand what you want to do with attributes. This is
> > my bad. Let me try to explain:
> > 
> > It seems you want the "vector-variants" attributes (which I could not
> > find with this name in trunk, correct?) to "remember" what vector
> > versions can be created (wrt. validity), assuming a definition is
> > available? Correct?
> 
> Yes.
> 
> > What I was concerned with is the example I sketched somewhere below
> > which motivates the need for a generalized/standardized name mangling
> > for OpenMP. I though you wanted to avoid that somehow but if you don't I
> > misunderstood you. I basically removed the part where the vector
> > versions have to be created first but I assumed them to be existent (in
> > the module or somewhere else). That is, I assumed a call to foo and
> > various symbols available that are specializations of foo. When we then
> > vectorize foo (or otherwise specialize at some point in the future), you
> > would scan the module and pick the best match based on the context of
> > the call.
> > 
> 
> Yes, although the syntax you use below is wrong. Declare variant is
> attached to the scalar definition, and points to a vector definitions
> (the variant) that is declared/defined in the same compilation unit
> where the scalar version is visible.

Yeah, I do it all the time. They changed that last minute in 5.0,... I'm
emotionally attached to the old one ;)

> > Now I don't know if I understood your proposal by now but let me ask a
> > question anyway:
> > 
> > VecClone.cpp:276-278 mentions that the vectorizer is supposed to look at
> > the vector-variants functions. This works for variants that are created
> > from definitions in the module but what about #omp declare simd
> > declarations?
> > 
> 
> VectorClone does more than just mapping a scalar version to a vector
> one. It builds also the vector version definition by auto-vectorizing
> the body of the scalar function.

I get that.

> I don’t know if the patches related to VecClone also are intended to
> use the `vector-variant` attribute for function declaration with a
> #pragma omp declare simd. On aarch64, in Arm compiler for HPC, we do
> that to support vector math libraries. It works in principle, but
> `vector variant` allows more context selection (and custom names
> instead of vector ABI names, which are easier for users).

This seems to be very interesting. What declarations are considered
"vector-variants" in the first place? I could see all of the below shown
versions to be reasonable choices.

#pragma omp declare simd (foo)

#pragma omp declare variant match(simd)
void bar(void);

#pragma omp declare variant match(parallel, simd)
void baz(void);

<2 x double> sin(<2 x double>) {}
<4 x float> sin(<4 x float>) { ... }

The ticket I mentioned earlier (#940, see link below), is proposing a
begin/end version of declare variant that would declare the enclosing
definitions as variants of functions with the same prototype. This goes
in the same direction as the sin definitions above but makes it more
explicit. One then might want to write something like:

#pramga omp begin declare variant match(nvptx, simd)
<4 x float> sin(<4 x float>) { ... }
<2 x double> sin(<2 x double>) { ... }
#pramga omp end declare variant

and expect these methods to be used if we have `sin` in a vectorized
environment. If we would only go by the mangled names available in the
module, this should at least be possible.

> > On 05/31, Francesco Petrogalli wrote:
> >>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
> >>> 
> >>> I think we should split this discussion:
> >>> TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP 5.X
> >>>                  features, including compatibility with other
> >>>                  compilers and cross module support.
> >> 
> >> Yes, and we have to carefully make this as standard and compatible as possible.
> > 
> > Agreed.
> > 
> > 
> >>> TOPIC 3b & 5: Interoperability with clang declare (system vs. user
> >>>                declares)
> >> 
> >> 
> >> I think that Alexey explanation of how the directive are handled
> >> internally in the frontend makes us propound towards the attribute. 
> > 
> > How things are handled right now, especially given that declare variant
> > is not handled at all, should not limit our design space. If the
> > argument is that we cannot reasonably implement a solution, that is a
> > different story.
> > 
> > 
> >>> TOPIC 3a & 3c: floating point issues?
> >>> 
> >> 
> >> I believe there is no issue there. I have quoted the openMP standard in reply to Renato:
> >> 
> >> See https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 118, lines 23-24:
> >> 
> >> “The execution of the function or subroutine cannot have any side
> >> effects that would alter its execution for concurrent iterations of a
> >> SIMD chunk."
> > 
> > Great.
> > 
> > 
> >>> I inlined comments for Topic 1 below.
> >>> 
> >>> I hope that we do not have to discuss topic 2 if we agree neither
> >>> attributes nor metadata is necessary, or better, will solve the actual
> >>> problem at hand. I don't have strong feeling on topic 4 but I have the
> >>> feeling this will become less problematic once we figure out topic 1.
> >>> 
> >>> Thanks,
> >>> Johannes
> >>> 
> >>> 
> >>> On 05/31, Francesco Petrogalli wrote:
> >>>> # TOPIC 1: concerns about name mangling
> >>>> 
> >>>> I understand that there are concerns in using the mangling scheme I
> >>>> proposed, and that it would be preferred to have a mangling scheme
> >>>> that is based on (and standardized by) OpenMP. 
> >>> 
> >>> I still think it will be required to have a standardized one, not
> >>> only preferred.
> >>> 
> >>> 
> >> 
> >> I am all with you in standardizing. x86 and arch64 have their own
> >> vector function ABI, which, although “private”, are to be considered
> >> standard. Opensource and commercial compilers are using them,
> >> therefore we have to deal with this mangling scheme, whether or not
> >> OpenMP comes up with a standard mangling scheme.
> > 
> > I don't get the point you are trying to make here. What do you mean by
> > "we have to deal with"? (I do not suggest to get rid of them.)
> > 
> 
> That we cannot ignore the fact that the name scheme is already
> standardized by the vendors, so let’s first deal with what we have,
> and think about the OpenMP mangling scheme only once there is one
> available.

Again, I do not want to get rid of what we have or even replace it if it
is not necessary. I try to determine if the scheme is sufficient for 5.0
and later extensions that are undoubtedly coming.

Another example is:
If we would have target and host code in the same IR module, could we
distinguish a vector version for a target from one for the host? If not,
that would be a problem we should not ignore.

> >>>> I hear the argument on having some common ground here. In fact, there
> >>>> is already common ground between the x86 and aarch64 backend, who have
> >>>> based their respective Vector Function ABI specifications on OpenMP.
> >>>> 
> >>>> In fact, the mangled name grammar can be summarized as follows:
> >>>> 
> >>>> _ZGV<isa><masking><VLEN><parameter type>_<scalar name>
> >>>> 
> >>>> Across vector extensions the only <token> that will differ is the
> >>>> <isa> token.
> >>>> 
> >>>> This might lead people to think that we could drop the _ZGV<isa>
> >>>> prefix and consider the <masking><VLEN><parameter type>_<scalar name>
> >>>> part as a sort of unofficial OpenMP mangling scheme: in fact, the
> >>>> signature of an “unmasked 2-lane vector vector of `sin`” will always
> >>>> be `<2 x double>(2 x double>).
> >>>> 
> >>>> The problem with this choice is the number of vector version available
> >>>> for a target is not unique.
> >>> 
> >>> For me, this simply means this mangling scheme is not sufficient.
> >>> 
> >> 
> >> Can you explain more why you think the mangling scheme is not
> >> sufficient? The mangling scheme is shaped to provide all the
> >> information that the OpenMP directive describes.
> > 
> > I don't know if it is insufficient but I though you hinted towards that.
> 
> I didn’t mean that, the tokens in the vector function ABI mangled schemes are sufficient.
> 
> > If we can handle/decode everything we need for declare variants then I
> > do not object at all. If not, we require respective extension such that
> > we can. The result should be a superset of the current SIMD encoding and
> > compatible with the current one.
> > 
> > 
> 
> We can handle/decode everything for a SIMD context. :)

What about a context that combines SIMD and something else, e.g.,
parallel?

> >> The fact that x86 and aarch64 realize such information in different
> >> way (multiple signature/vector extensions) is something that cannot be
> >> avoided, because it is related to architectural aspects that are
> >> specific to the vector extension and transparent to the OpenMP
> >> standard.
> > 
> > I don't think that is a problem (that's why I "failed to see the
> > problem" in the comment below). I look at it this way: If #declare simd,
> > or similar, results in N variants, it should at the end of the day not
> > be different from declaring these N variants explicitly with the
> > respective declare variant match clause.
> > 
> 
> That’s not the case. #declare simd should create all the versions that are optimal for the target. We carefully thoght about that when writing the vector function ABI. Most of the constrains derive by the fact that each target has a specific register size. 
> 
> Example:
> 
> #pragma omp declare simd
> Float foo(float);
> 
> X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see https://godbolt.org/z/m1BUVt
> Arm NEON: -> 4 versions {2, 4 lanes} x {masking, no masking }
> Arm SVE: -> 1 version
> 
> Therefore, the outcome of declare simd is not target independent. Your
> expectation are met only inside one target.

The above outcome is fine. What version exist is target dependent. The
encoding and selection is the interesting part I guess.

> >>>> In particular, the following declaration generates multiple vector
> >>>> versions, depending on the target:
> >>>> 
> >>>> #pragma omp declare simd simdlen(2) notinbranch
> >>>> double foo(double) {…};
> >>>> 
> >>>> On x86, this generates at least 4 symbols (one for SSE, one for AVX,
> >>>> one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
> >>>> 
> >>>> On aarch64, the same declaration generates a unique symbol, as
> >>>> specified in the Vector Function ABI.
> >>> 
> >>> I fail to see the problem. We generate X symbols for X different
> >>> contexts. Once we get to the point where we vectorize, we determine
> >>> which context fits best and choose the corresponding symbol version.
> >>> 
> >> 
> >> Yes, this is exactly what we need to do, under the constrains that
> >> the rules for  generating "X symbols for X different contexts” are
> >> decided by the Vector Function ABI of the target.
> > 
> > Sounds good. The vector ABI is used to determine what contexts exists
> > and what symbols should be created. I would assume the encoding should
> > be the same as if we specified the versions (/contexts) ourselves via
> > #declare variant.
> > 
> 
> Oh yes, vector functions listed in a declare variant should obey the
> vector function ABI rules (other than the function name).
> 
> > 
> >>> Maybe my view is to naive here, please feel free to correct me.
> >>> 
> >>> 
> >>>> This means that the attribute (or metadata) that carries the
> >>>> information on the available vector version needs to deal also with
> >>>> things that are not usually visible at IR level, but that might still
> >>>> need to be provided to be able to decide which particular instruction
> >>>> set/ vector extension needs to be targeted.
> >>> 
> >>> The symbol names should carry all the information we need. If they do
> >>> not, we need to improve the mangling scheme such that they do. There is
> >>> no attributes/metadata we could use at library boundaries.
> >>> 
> >> Hum, I am not sure what you mean by "There is no attributes/metadata
> >> we could use at library boundaries."
> > 
> > (This seems to be part of the misunderstanding, I leave my comment here
> > anyway:)
> > 
> > The simd-related stuff works because it is a uniform mangling scheme
> > used by all compilers. Take the situation below in which I think we want
> > to call foo_CTX in the library. If so, we need a name for it.
> > 
> 
> In the situation below, the mangled name is going to be the same for
> both compilers, as long as they adhere to the vector function ABI.

Assuming CTX is only a SIMD context. What if it is more than that?

> > a.c:  // Compiled by gcc into a library
> > #omp declare variant (foo) match(CTX)
> > void foo_CTX(...) {...}
> > 
> > b.c:  // Compiled by clang linked against the library above.
> > #omp declare variant (foo) match(CTX)
> > void foo_CTX(...);
> > 
> > void bar(...) {
> >  #pragma omp CTX
> >  foo();   // <- What function (symbol) do we call if a.c was compiled
> >           //    by gcc and b.c with clang?
> > }
> > 
> 
> Please notice that `declare variant` needs to be attached to the scalar function, not the vector one.
> 
> ```
> #pragma omp declare variant(foo_CTX) match (context=simd…
> double foo (double) {…}
> 
> Vector_double_ty foo_CTX(vector_double_ty) {…}
> ```
> 
> In vectorizing foo in bar, the compiler will not care where foo_CTX
> would come from (of course, as long as the scalar+declare variant
> declarations are visible).

OK. What happens if you merge two modules, one with foo and declare
variants that resulted in the "vector-version" attribute and one where
foo is just a declaration? (One may not originate in a C/C++ declaration
though as that would potentially be undefined in OpenMP*).

* OpenMP 5.0 Restriction: If the function has any declarations, then the
  declare simd construct for any declaration that has one must be
  equivalent to the one specified for the definition.  Otherwise, the
  result is unspecified.
(I'm not sure if that is supposed to be true across compilation units or
 not, I would have guessed it is not though.)

And what happens if CTX is more complex, e.g., target + SIMD, parallel +
SIMD, in which case we have to take the additional non-SIMD part into
account to select the correct version. Since the SIMD version is
selected late, we might have to remember the original syntactic scope of
the call to avoid picking the wrong version. (I would argue picking the
version late (after inlining), even if it is not the same as early
picking would have resulted in, is actually good but I think the
standard would require syntactic context evaluation.)

> >> In our downstream compiler (Arm compiler for HPC, based on LLVM), we
> >> use `declare simd` to provide vector math functions via custom header
> >> file. It works brilliantly, if not for specific aspects that would be
> >> perfectly covered by the `declare variant`, which might be one of the
> >> reason why the OpenMP committee decided to introduce `declare
> >> variant`.
> > 
> > But you (assume that you) control the mangling scheme across the entire
> > infrastructure. Given that the simd mangling is de-facto standardized,
> > that works.
> > 
> > Side note:
> > Declare variant, as of 5.0, is not flexible enough for a sensible
> > inclusion of target specific headers. That will change in 5.1.
> > 
> 
> Could you point me at the discussion in 5.1 on this specific aspect?

A begin/end version of declare variant, described in more detail above:
http://trac.openmp.org/trac/OpenMP/ticket/940

> >> If your concerns is that by adding an attribute that somehow represent
> >> something that is available in an external library is not enough to
> >> guarantee that that symbol is available in the library… not even C
> >> code can guarantee that? If the linker is not pointing to the right
> >> library, there is nothing that can prevent it to fail if the symbol is
> >> not present? 
> > 
> > I don't follow the example you describe. I don't want to change anything
> > in how symbols are looked up or what happens if they are missing.
> > 
> > 
> 
> I don’t want to change that too :). I think we are misunderstanding
> each other here...

Probably.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190531/53690f3e/attachment.sig>