[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tue Oct 9 11:45:00 PDT 2018

[+CC Naoki Shibata (SLEEF), Xinmin Tian (Intel), Renato Golin (Linaro) ]

Hi All,

Apologies for jumping in so late in this thread.

The scalar-to-vector mapping mechanism works pretty well with the Openmp directive `#pragma amp declare simd`. We have implemented it in arm compiler for HPC [1]. We didn’t have to do any hack in the TargetLibraryInfo lists of vector functions, and the functionality is independent on the choice of the target library.

This is a commercial compiler, so the actual implementation of the functionality doesn’t conform 100% to the LLVM way of doing things [2], but Arm is working with Intel to be able to provide a fully open source implementation of this mechanism that will work for all targets that specify a vector function ABI based on `#pragma imp declare simd`. Intel and Arm work is available at [3], we would like to hear your opinion on this, feel free to join the review.

The functionality provided by the Vector Clone pass [3] goes as follows:

1. The vectorizer is informed via an attribute of the availability of a vector function associated to the scalar call in the scalar loop
2. The name of the vector function carries the info generated from the `declare simd` directive associated to the scalar declaration/definition.
3. The vectorizer chooses which vector function to use based in the information generated by the vector-variant attribute associated to the original scale function.

This mechanism is modular (clang and opt can be tested separately as the vectorization information are stored in the IR via an attribute), therefore it is superior  to the functionality in Arm compiler for HPC, but it is equivalent in the case of function definition, which is the case we need to interface external vector libraries, whether math libraries or any other kind of vector library.

As one can see from [1], the list of available vector functions is not coded in the TLI, but just provided via a header file in <clang>/lib/Headers/math.h, which is easy to maintain.

As it is, this mechanism cannot be used as a replacement for the VECLIB functionality, because external libraries like SVML, or SLEEF, have their own naming conventions. To this extend the new directive `declare variant` of the upcoming OpenMP 5.0 standard is, in my opinion, the way forward.  This directive allows to re-map the name associated to a `declare simd` declaration/definition to  a new name chosen by the user/library vendor.

For example, in case of __svml_sin4 on x86, the declaration could be the following [4]:

```
#pragma omp declare simd simdlen(4) notinbranch
float sinf(float);

#ifdef USE_SVML
    #pragma omp declare variant (float sinf(float)) match(construct = {simd(notinbranch, simdlen(4))}, device={uarch(sse)}
    __m128d __svml_sin4(__m128d);
#endif
```

With this construct it would be able to choose the list of vector functions available in the library by simply tweaking the command line to select the correct portion of the header file shipped with the compiler [6], without the need to maintain lists in the TLI source code, and completely splitting the functionality between frontend and backend, with no dependencies.

Finally, for those interested, I just wanted to point the BoF [7] I will be running in the LLVM dev meeting in San Jose, where I would like to discuss these topics with anyone interested. Hopefully the meeting will help moving forward these functionalities in clang/LLVM.

Kind regards,

Francesco

[1] https://developer.arm.com/products/software-development-tools/hpc/documentation/vector-math-routines
[2] Our implementation of declare simd require clang and opt to be coupled (they cannot be tested separately as there is no attribute in the IR that describes the availability of the vector function).
[3] VectorClone pass and related patches.
1. https://reviews.llvm.org/D40577 - clang patches to add the SIMD mangled names as “vector-variants” attribute (ib/CodeGen/CGOpenMPRuntime.cpp)
2. https://reviews.llvm.org/D40575 - loop vectorizer pass that interfaces with the vector clone pass
3. https://reviews.llvm.org/D22792 - vector clone pass
4. https://reviews.llvm.org/D52579 - Additional tests
[4] https://www.openmp.org/wp-content/uploads/openmp-TR7.pdf
[5] Disclaimer: I am not an expert in x86 vector extension, the code you see in the example might be broken, it is there for an illustrative example.
[6] This wouldn’t work with a Fortran frontend, as there is no equivalent of C header files in Fortran. In any case, this OpenMP_5.0-based solution is in my opinion better than the TLI-list-based one as it allows to split completely frontend and backend. Yes the Fortran frontend will have to list the equivalent of the C header file somewhere in its sources, but it wouldn’t be touching any code in the mid-end/back-end.
[7] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7

> On Jul 9, 2018, at 12:36 PM, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>
> All,
>
> It looks like we are finally converging into
>
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two.
>
> I was hoping to collectively come up with a better solution, but not at all surprised to see us settling down to this known-to-work practical approach.
>
> We need a more elaborate VECLIB setting, taking per-target function availability into account. Also, much of the "legalized call" mechanism should work for OpenMP declare simd --- and we should make that easier for reuse in case other FE/optimizers want to emit legalized calls.
>
> Simon, is the RV "legalized call emission" code easily reusable outside of RV? If yes, would you be able to restructure it so that it can reside, say, under Transforms/Utils?
>
> I think this RFC is ready to close at the end of this week. Thank you very much for all the lively discussions. If anybody have more inputs, please speak up soon.
>
> Thanks,
> Hideki
>
> ===========================================
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Wednesday, July 04, 2018 9:59 AM
> To: Robert Lougher <rob.lougher at gmail.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>
> Cc: Saito, Hideki <hideki.saito at intel.com>; Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com; llvm-dev at lists.llvm.org; dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/04/2018 07:50 AM, Robert Lougher wrote:
> Hi,
>
> On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>; Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com
> Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
> Hi Hal,
>
>> __svml_sin8 (plus whatever shuffles are necessary).
>> The vectorizer should do this.
>> It should not generate calls to functions that don't exist.
>
> I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.
>
> Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?
>
> Regards,
> Ashutosh
>
> If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist.
>
> Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4?
>
>
>
> The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.
>
> I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.
>
>  -Hal
>
>
>   Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded).
> I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check).   If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call.  The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper.
>
> While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do.  I'm thinking specifically of D48193:
>
> https://reviews.llvm.org/D48193
> In this case we have poor code generation due to the interleave count selected by the loop vectorizer.  I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705).
> Thanks,
> Rob.
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.