[PATCH] D53927: [AArch64] Enable trigonometry libm vectorized functions via SLEEF

Wed Oct 31 11:34:18 PDT 2018

>> 2. You are using Vector Function ABI mangled names, which means you ware assuming to link agains `libsleefgnuabi.so` instead of `libsleef.so`. For this reason, I think it is worth to replace the `-fveclib=SLEEF` option with `-fveclib=SLEEFGNUABI`. This request applies to all places where you  have used `SLEEF`.
>
> Sorry but I wholeheartedly disagree with this idea.
>
> For starters, saying -fveclib=SLEEFGNUABI is just ugly IMO.
>
> Furthermore, I do not understand why this change is necessary. It provides nothing that the current naming convention doesn't already provide. So, really, it's a matter of personal preference.
>

Fair enough. :)
I just think `SLEEFGNUABI` is more precise, and avoids confusion given that SLEEF is packaged in two version, with incompatible naming conventions and signatures. You are clearly using the conventions in `libsleefgnuabi.so`, I think I just think it is better  to use `-fveclib=SLEEFGNUABI`.
Moreover, someone might want to add `-fveclib` for the non GNUABI version of the library - for example, they might prefer the `sincos` signatures of `libsleef` over `libsleefgnuabi`. I agree that `-ffveclib=SLEEFGNUABI` might be ugly to someone, but not more than `-fveclib=SLEEFNONGNUABi`.

I will not enforce this though, it is just a name.

> Also: Intel did not name their veclib argument "SVMLINTELABI". They named it "SVML”.

I think this could have happened whether they would have come up with a version of the library that had new signatures and name mangling scheme.

>> 3. Given that SLEEF has some degree of target independence, I would recommend using the TargetTriple `if` that selects the list of functions into a switch statement that sets the architectural extension token in the mangled names. On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets. Notice that I am not saying that you should add all targets, I am just saying that you should prepare the ground for extending this, as I think that `-fveclib=SLEEFGNUABI` should work the same for all the extensions supported by SLEEF.
>
> I don't understand any of this. It seems to me that you are describing what I've already done.
>
> The TargetTriple wasn't present in the TargetLibraryInfoImpl class. I had to add it, specifically because it is needed by the SLEEF switch statement case for AArch64.
>
>> On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets.
>
> I don't understand what this means.
>
> Are you suggesting that the VecLib array of function name mappings should be generated by tblgen? Yes, I agree. It should be. But it currently isn't. And I don't know when tblgen will be extended to do this job. Or if it ever will be extended to do this job.
>
> Also, we (Cavium) will not be extending the SLEEF mappings to other ISA's. Simply because we do not have the capability of testing PPC64, or Windows.
>
> So, really, providing additional SLEEF mappings for other ISA's or platforms is best suited for those who are best versed in the subtle details of those other ISA's or platforms.
>
> The SLEEFGNUABI ABI has very little CPU independence when compiling for for X86_64. It is, in fact, very CPU-dependent.
>
> In fact, in the current implementation of TargetLibraryInfo, it is impossible to provide X86_64 mappings for SLEEF using the GNUABI. That is because the X86_64 GNUABI mappings require knowledge of the capabilities of the specific CPU that the object is being compiled for. These CPU capabilities are hardcoded by SLEEF, in the SLEEF GNU ABI mangled name:
>
> 'b' -> SSE
> 'c' -> AVX
> 'd' -> AVX2
> 'e' -> AVX512
>
> So, at a minimum, on X86_64, you would need to know the values passed to -march= and -mtune=. But these CPU capabilities are currently inaccessible from TargetLibraryInfo.
>
> TargetLibraryInfo could be extended to acquire the CPU capability information, but that's for another changeset. Maybe after TargetLibraryInfo is extended to support this capability we can revisit the construction of the GNU ABI mangled name.
>

I think there is no need to use tablegen for this. For the rest, I agree with all you said. Let me explain what I exactly mean by “prepare the ground to extend this”.

You have encoded the list of function for AArch64 as follows (allow me to use pseudocode).

```
If (AArch64) {
  List ={ {“sin”, “_ZGVn2v_sin”, 2},
             {“cos”, “_ZGVn2v_cos”, 2},
             ….
};
}
```

I think you should rewrite your code as

```
  List ={ {“sin”, “_ZGV<simdext>2v_sin”, 2},
             {“cos”, “_ZGV<simdext>2v_cos”, 2},
             ….
};
If (aarch64)
   ext_token = “n”.

Loop through (list) and replace “<simdext>” with ext_token.
```
With this change, if anyone decide to add any of the other architectures (of course, adding was is missing in target triple to discern the right extension token), all they have to do is to add new logic to set the ext_token variable correctly.

Let me stress again the fact that I am not asking you do do the non aarch64 work. I just think it make sense to make sure other people can extend it easily.

>> 4. On the longer term, I want to make you aware of the fact that there is plan to replace the TLI implementation of -fveclib with an OpenMP based one (to decouple backend and frontend). We discussed this at the last LLVM dev meeting, I will circulate an RFC soon. See https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7.
>
> That's an interesting plan.
>
> I'd like to make you aware that brute-forcing OpenMP for vectorization will probably run afoul of SPEC rules.
>
> SPEC allows OpenMP in SPECspeed, but **not** in SPECrate, and that is clearly stated at the SPEC web site. Not to mention that the SPEC benchmarks code cannot be modified to add #pragma omp simd. And having clang silently and secretly modify emitted code to insert OpenMP simd will likely not fly at SPEC.
>
> I am also not a fan of OpenMP being used as a giant hammer that can hit any nail. You would be introducing a dependency on OpenMP for programs that do not otherwise use, or need, OpenMP. So, in reality, there is no "decoupling" here. There is only a dependency substitution: the existing dependency on SLEEF is replaced by the future dependency on OpenMP.
>

The OpenMP based implementation will pull in only the simd capabilities of openmp (`-fopenmp-simd`), not the whole OpenMP specs. I will be more detailed on this in the RFC, and add you in the discussion to make sure your concerns are addressed.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.