[PATCH] D53927: [AArch64] Enable libm vectorized functions via SLEEF

Thu Nov 1 18:24:24 PDT 2018

> 
> OK, I like the idea of having a generalized way of expressing the libm <-> SLEEF bindings.
> 

Great! But… (yes, I also have a but now!)

> But, there's always a 'but':
> 
> 1. Looping through this static array of struct bindings, and doing the replacement of the ${ext_token} with the appropriate mangling symbol requires an extra function call. Do we want that extra function call?
> 
> 2. This gets even more complicated. Here's why:
> 
> - Here are the available SLEEF functions for //atan// on AArch64. We get them with
> 
> readelf -sW libsleefgnuabi.so.3.3 | awk '{ printf "%s\t%s\n", $5, $8 }' | egrep atan | egrep -v 'atan2|atanh
> 
> GLOBAL  _ZGVnN2v_atan
> GLOBAL  _ZGVnN4v_atanf_u35
> GLOBAL  _ZGVnN4v_atanf
> GLOBAL  _ZGVnN2v_atan_u35
> GLOBAL  _ZGVnN2v_atan_u35
> GLOBAL  _ZGVnN2v_atan
> GLOBAL  _ZGVnN4v_atanf
> GLOBAL  _ZGVnN4v_atanf_u35
> 
> - And here are the SLEEF functions for //atan// for X86_64, which was compiled with -march=core-avx2:
> 
> WEAK    _ZGVeM8v___atan_finite
> GLOBAL  _ZGVeM8v_atan
> GLOBAL  _ZGVeM8v_atan_u35
> GLOBAL  _ZGVbN4v_atanf
> GLOBAL  _ZGVbN2v_atan_u35
> GLOBAL  _ZGVcN4v_atan
> GLOBAL  _ZGVeN8v_atan_u35
> GLOBAL  _ZGVdN4v_atan_u35
> GLOBAL  _ZGVbN2v_atan
> GLOBAL  _ZGVbN4v_atanf_u35
> WEAK    _ZGVeM16v___atanf_finite_u35
> WEAK    _ZGVeM16v___atanf_finite
> GLOBAL  _ZGVdN8v_atanf_u35
> WEAK    _ZGVeM8v___atan_finite_u35
> GLOBAL  _ZGVcN4v_atan_u35
> GLOBAL  _ZGVdN8v_atanf
> GLOBAL  _ZGVcN8v_atanf
> GLOBAL  _ZGVcN8v_atanf_u35
> GLOBAL  _ZGVdN4v_atan
> GLOBAL  _ZGVeN16v_atanf_u35
> GLOBAL  _ZGVeN16v_atanf
> GLOBAL  _ZGVeM16v_atanf
> GLOBAL  _ZGVeM16v_atanf_u35
> GLOBAL  _ZGVeN8v_atan
> GLOBAL  _ZGVeN16v_atanf
> WEAK    _ZGVeM8v___atan_finite
> WEAK    _ZGVeM16v___atanf_finite_u35
> GLOBAL  _ZGVeM8v_atan_u35
> GLOBAL  _ZGVdN4v_atan_u35
> GLOBAL  _ZGVbN4v_atanf
> WEAK    _ZGVeM16v___atanf_finite
> WEAK    _ZGVeM8v___atan_finite_u35
> GLOBAL  _ZGVeM16v_atanf
> GLOBAL  _ZGVeM8v_atan
> GLOBAL  _ZGVeN16v_atanf_u35
> GLOBAL  _ZGVcN4v_atan_u35
> GLOBAL  _ZGVcN4v_atan
> GLOBAL  _ZGVeN8v_atan_u35
> GLOBAL  _ZGVbN4v_atanf_u35
> GLOBAL  _ZGVdN4v_atan
> GLOBAL  _ZGVbN2v_atan_u35
> GLOBAL  _ZGVdN8v_atanf
> GLOBAL  _ZGVbN2v_atan
> GLOBAL  _ZGVcN8v_atanf
> GLOBAL  _ZGVcN8v_atanf_u35
> GLOBAL  _ZGVdN8v_atanf_u35
> GLOBAL  _ZGVeN8v_atan
> GLOBAL  _ZGVeM16v_atanf_u35
> 
> As you can see, for X86_64, the bindings are not as straightforward (read: 1-1) as they currently are on AArch64. On X86_64, there's several variants for //atan//, and for the same vectorization factor. The discriminating factor between the several variants being the CPU capability.
> 

… but you made a good point here in listing the variants with different lanes on x86. I missed this in my reasoning yesterday. Even if we find a way of generalizing the list of function, we still have to make sure the NumberOfLanes integral field of the mapping is appropriate for the function being used, which means the list of AArch64 functions cannot be shared with the x86 one. The hypothetical function call you mentioned before for setting the name would have to set the number of lanes, and actually add new records to the list… too convoluted when compared to a clear static list.

> This will happen on AArch64 too when SVE is introduced. We'll end up having several different versions of //atan//: one for SVE, the other for non-SVE.
> 
> I have a strong suspicion - that I have not verified in practice - that PPC/PPC64 is exactly the same. There will be several different versions for every function, depending on which vectorization model / CPU capability model was chosen at compile-time (-march= | -mtune=).
> 
> Right now, it is not possible to make the distinction between different CPU capabilities inside TargetLibraryInfo. That's because the information from -march= and -mtune= isn't captured in TargetLibraryInfo. The only thing we have - right now - is the TargetTriple. And the TargetTriple tells us nothing about the -march= and/or -mcpu= micro-arch tuning.
> 
> So, we need to add some new information to TargetLibraryInfo: namely the -march= and -mtune= information that was passed to clang. I think implementing this bit is part of a larger - and different - discussion. And I am very reluctant to mix the TargetLibraryInfo enhancement with SLEEF support in AArch64, in the same changeset.
> 

Yes, and the way we will go will probably be to skip the TLI and give this responsibility to the LoopVectorizer. But as you said, this is a discussion that needs to be taken at a different level, I will talk about this in the openMP sims-only based proposal.

> So, what I would suggest **for now**, is that we postpone this generalization of the libm <-> SLEEF bindings until we have a complete picture of what are the various - and subtle - differences between the various CPU capabilities mangling. We don't yet have a complete picture - I can't build SLEEF on PPC/PPC64 simply because I don't have access to a PPC/PPC64 box.
> 
> What do you think?
> 
> 

I agree with you. The design of the patch is OK as it is. Thank you for your patience. I will go through it carefully.

But, if I may, I would like to stress the fact that you should use -fveclib=SLEEFGNUABI. I think we can agree that, although SLEEFGNUABI might be ugly, it is the right name to use for uniformity with the linker flags it implies (-fsleefgnuabi).

Kind regards,

Francesco