[PATCH] D53927: [AArch64] Enable libm vectorized functions via SLEEF

Thu Nov 1 07:44:15 PDT 2018

steleman added a comment.

In https://reviews.llvm.org/D53927#1282601, @llvm-commits wrote:

>

>> The SLEEFGNUABI ABI has very little CPU independence when compiling for for X86_64. It is, in fact, very CPU-dependent.
>> 
>> In fact, in the current implementation of TargetLibraryInfo, it is impossible to provide X86_64 mappings for SLEEF using the GNUABI. That is because the X86_64 GNUABI mappings require knowledge of the capabilities of the specific CPU that the object is being compiled for. These CPU capabilities are hardcoded by SLEEF, in the SLEEF GNU ABI mangled name:
>> 
>> 'b' -> SSE
>>  'c' -> AVX
>>  'd' -> AVX2
>>  'e' -> AVX512
>> 
>> So, at a minimum, on X86_64, you would need to know the values passed to -march= and -mtune=. But these CPU capabilities are currently inaccessible from TargetLibraryInfo.
>> 
>> TargetLibraryInfo could be extended to acquire the CPU capability information, but that's for another changeset. Maybe after TargetLibraryInfo is extended to support this capability we can revisit the construction of the GNU ABI mangled name.
> 
> I think there is no need to use tablegen for this. For the rest, I agree with all you said. Let me explain what I exactly mean by “prepare the ground to extend this”.
> 
> You have encoded the list of function for AArch64 as follows (allow me to use pseudocode).
> 
>   If (AArch64) {
>     List ={ {“sin”, “_ZGVn2v_sin”, 2},
>                {“cos”, “_ZGVn2v_cos”, 2},
>                ….
>   };
>   }
> 
> 
> I think you should rewrite your code as
> 
>     List ={ {“sin”, “_ZGV<simdext>2v_sin”, 2},
>                {“cos”, “_ZGV<simdext>2v_cos”, 2},
>                ….
>   };
>   If (aarch64)
>      ext_token = “n”.
>   
>   Loop through (list) and replace “<simdext>” with ext_token.
> 
> 
> With this change, if anyone decide to add any of the other architectures (of course, adding was is missing in target triple to discern the right extension token), all they have to do is to add new logic to set the ext_token variable correctly.

OK, I like the idea of having a generalized way of expressing the libm <-> SLEEF bindings.

But, there's always a 'but':

1. Looping through this static array of struct bindings, and doing the replacement of the ${ext_token} with the appropriate mangling symbol requires an extra function call. Do we want that extra function call?

2. This gets even more complicated. Here's why:

- Here are the available SLEEF functions for //atan// on AArch64. We get them with

readelf -sW libsleefgnuabi.so.3.3 | awk '{ printf "%s\t%s\n", $5, $8 }' | egrep atan | egrep -v 'atan2|atanh

GLOBAL  _ZGVnN2v_atan
GLOBAL  _ZGVnN4v_atanf_u35
GLOBAL  _ZGVnN4v_atanf
GLOBAL  _ZGVnN2v_atan_u35
GLOBAL  _ZGVnN2v_atan_u35
GLOBAL  _ZGVnN2v_atan
GLOBAL  _ZGVnN4v_atanf
GLOBAL  _ZGVnN4v_atanf_u35

- And here are the SLEEF functions for //atan// for X86_64, which was compiled with -march=core-avx2:

WEAK    _ZGVeM8v___atan_finite
GLOBAL  _ZGVeM8v_atan
GLOBAL  _ZGVeM8v_atan_u35
GLOBAL  _ZGVbN4v_atanf
GLOBAL  _ZGVbN2v_atan_u35
GLOBAL  _ZGVcN4v_atan
GLOBAL  _ZGVeN8v_atan_u35
GLOBAL  _ZGVdN4v_atan_u35
GLOBAL  _ZGVbN2v_atan
GLOBAL  _ZGVbN4v_atanf_u35
WEAK    _ZGVeM16v___atanf_finite_u35
WEAK    _ZGVeM16v___atanf_finite
GLOBAL  _ZGVdN8v_atanf_u35
WEAK    _ZGVeM8v___atan_finite_u35
GLOBAL  _ZGVcN4v_atan_u35
GLOBAL  _ZGVdN8v_atanf
GLOBAL  _ZGVcN8v_atanf
GLOBAL  _ZGVcN8v_atanf_u35
GLOBAL  _ZGVdN4v_atan
GLOBAL  _ZGVeN16v_atanf_u35
GLOBAL  _ZGVeN16v_atanf
GLOBAL  _ZGVeM16v_atanf
GLOBAL  _ZGVeM16v_atanf_u35
GLOBAL  _ZGVeN8v_atan
GLOBAL  _ZGVeN16v_atanf
WEAK    _ZGVeM8v___atan_finite
WEAK    _ZGVeM16v___atanf_finite_u35
GLOBAL  _ZGVeM8v_atan_u35
GLOBAL  _ZGVdN4v_atan_u35
GLOBAL  _ZGVbN4v_atanf
WEAK    _ZGVeM16v___atanf_finite
WEAK    _ZGVeM8v___atan_finite_u35
GLOBAL  _ZGVeM16v_atanf
GLOBAL  _ZGVeM8v_atan
GLOBAL  _ZGVeN16v_atanf_u35
GLOBAL  _ZGVcN4v_atan_u35
GLOBAL  _ZGVcN4v_atan
GLOBAL  _ZGVeN8v_atan_u35
GLOBAL  _ZGVbN4v_atanf_u35
GLOBAL  _ZGVdN4v_atan
GLOBAL  _ZGVbN2v_atan_u35
GLOBAL  _ZGVdN8v_atanf
GLOBAL  _ZGVbN2v_atan
GLOBAL  _ZGVcN8v_atanf
GLOBAL  _ZGVcN8v_atanf_u35
GLOBAL  _ZGVdN8v_atanf_u35
GLOBAL  _ZGVeN8v_atan
GLOBAL  _ZGVeM16v_atanf_u35

As you can see, for X86_64, the bindings are not as straightforward (read: 1-1) as they currently are on AArch64. On X86_64, there's several variants for //atan//, and for the same vectorization factor. The discriminating factor between the several variants being the CPU capability.

This will happen on AArch64 too when SVE is introduced. We'll end up having several different versions of //atan//: one for SVE, the other for non-SVE.

I have a strong suspicion - that I have not verified in practice - that PPC/PPC64 is exactly the same. There will be several different versions for every function, depending on which vectorization model / CPU capability model was chosen at compile-time (-march= | -mtune=).

Right now, it is not possible to make the distinction between different CPU capabilities inside TargetLibraryInfo. That's because the information from -march= and -mtune= isn't captured in TargetLibraryInfo. The only thing we have - right now - is the TargetTriple. And the TargetTriple tells us nothing about the -march= and/or -mcpu= micro-arch tuning.

So, we need to add some new information to TargetLibraryInfo: namely the -march= and -mtune= information that was passed to clang. I think implementing this bit is part of a larger - and different - discussion. And I am very reluctant to mix the TargetLibraryInfo enhancement with SLEEF support in AArch64, in the same changeset.

So, what I would suggest **for now**, is that we postpone this generalization of the libm <-> SLEEF bindings until we have a complete picture of what are the various - and subtle - differences between the various CPU capabilities mangling. We don't yet have a complete picture - I can't build SLEEF on PPC/PPC64 simply because I don't have access to a PPC/PPC64 box.

What do you think?

Repository:
  rL LLVM

https://reviews.llvm.org/D53927