[llvm-dev] RFC: SIMD math-function library

Thu Jul 14 23:34:51 PDT 2016

----- Original Message -----
> From: "Martin J. O'Riordan via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Naoki Shibata" <shibatch.sf.net at gmail.com>, "Vedant Kumar" <vsk at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, July 15, 2016 1:09:44 AM
> Subject: Re: [llvm-dev] RFC: SIMD math-function library
> 
> I am looking forward to porting it to our platform, I know that this
> will be significant benefit.
> 
> We support 'v8f16' and v4f32' FP vector types natively, and having
> this library provide the optimised math functions for them will
> definitely be very useful.
> 
> All the best,
> 
> 	MartinO
> 
> -----Original Message-----
> From: Naoki Shibata [mailto:shibatch.sf.net at gmail.com]
> Sent: 15 July 2016 05:46
> To: Martin.ORiordan at Movidius.com; 'Vedant Kumar' <vsk at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: SIMD math-function library
> 
> 
> Hi Martin,
> 
> Thank you for your comment.
> 
> It is of course possible to rewrite SLEEF in more generic way, and
> actually I once tried to do that using the vector data type in GCC.
> But the code generated from such source code was far less efficient
> than the version with explicit SIMD intrinsics.
> 
> Adding typedefs to specify the exact types is possible.
> 
> Regards,
> 
> Naoki Shibata
> 
> 
> On 2016/07/14 18:25, Martin J. O'Riordan wrote:
> > Having support for vector equivalents to the ISO C math functions
> > is very valuable, and this kind of work of great benefit.
> >
> > There are a couple things though that concern me about this
> > proposal:
> >
> > 1.  OpenCL C already provides a vector math binding that for the
> > most
> >     part provides this equivalence.  It also supports vectors of
> >     multiple types through overloading.  Perhaps it might be
> >     possible
> >     to align SLEEF with OpenCL C?

[+Tom]

This is an interesting point. It might certainly make sense to integrate these routines with our OpenCL library implementation as well for targets that would benefit. Currently, we have scalar implementations of many math functions (e.g. http://llvm.org/svn/llvm-project/libclc/trunk/generic/lib/math/tanh.cl), and "vectorized" versions which just call the scalar functions (http://llvm.org/svn/llvm-project/libclc/trunk/generic/lib/clcmacro.h). If nothing else, it might make sense to borrow their naming convention?

Thanks again,
Hal

> >
> > 2.  There are hard assumptions about how 'float', 'double' and
> > 'long
> >     double' are implemented.  Libraries with these kinds of
> >     hard-wired
> >     assumptions (including 'compiler-rt') cause me a lot of trouble
> >     to
> >     port to our platform which is at variance with these common
> >     assumptions.
> >
> >     So I would suggest that the implementation uses  typedefs to
> >     specifically bind to the type that provides the specific FP
> >     precision required.
> >
> >     CLang supports the IEEE FP16, FP32, FP64, FP128 types, which
> >     can
> >     be bound to each of the higher level C types.  Our architecture
> >     binds these as FP16 for '__fp16' aka 'half', FP32 for 'float'
> >     AND
> >     for 'double' and FP64 for 'long double'.  There is no hardware
> >     support for FP64, so having 'float' and 'double' be FP32 is
> >     important to avoid the costly consequences of usual arithmetic
> >     conversions in C.
> >
> >     Using specific synonyms would greatly enhance the portability
> >     of
> >     the library implementation.  For example 'float32_t' instead of
> >     'float' - pity C/C++ don't have these as Standard yet.
> >
> > Thanks,
> >
> > 	MartinO
> >
> > -----Original Message-----
> > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf
> > Of
> > Naoki Shibata via llvm-dev
> > Sent: 14 July 2016 09:18
> > To: Vedant Kumar <vsk at apple.com>
> > Cc: llvm-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] RFC: SIMD math-function library
> >
> >
> > Hi Vedant,
> >
> > Thank you for your comment.
> >
> > For checking accuracy of finite outputs and correctness of handling
> > non-finite inputs and outputs, I believe validating against
> > libmpfr is enough. Please tell me the kind of regressions we need
> > to detect. Do you have concern on correctness of libmpfr?
> >
> > What kind of execution time or code size regressions are we going
> > to check? Since SLEEF is completely branch-free, there should be
> > no serious execution time and code size regression unless branches
> > are introduced.
> >
> > It is of course okay for me to add additional regression checking,
> > but I just want to understand the necessity.
> >
> > Regards,
> >
> > Naoki Shibata
> >
> >
> > On 2016/07/14 1:45, Vedant Kumar wrote:
> >> Hi Naoki,
> >>
> >> SLEEF looks very promising!
> >>
> >> Are SLEEF routines validated against libm, in addition to libmpfr?
> >> Are performance tracking tests in place to detect execution time
> >> or
> >> code size regressions? If these are missing, IMO it would be good
> >> to
> >> add them to the roadmap.
> >>
> >> best,
> >> vedant
> >>
> >>> On Jul 13, 2016, at 4:45 AM, Naoki Shibata via llvm-dev
> >>> <llvm-dev at lists.llvm.org> wrote:
> >>>
> >>>
> >>> Dear LLVM contributors,
> >>>
> >>> I am Naoki Shibata, an associate professor at Nara Institute of
> >>> Science and Technology.
> >>>
> >>> I and Hal Finkel would like to jointly propose to add my
> >>> vectorized math library to LLVM.
> >>>
> >>> The library has been available as public domain software for
> >>> years, I am going to double-license the library if necessary.
> >>>
> >>> ********
> >>>
> >>> Below is a proposal to add my vectorized math library, SLEEF [1],
> >>> for evaluating elementary functions (trigonometry, log, exp,
> >>> etc.) to LLVM. The library can be used directly, or can be
> >>> targeted by an autovectorization infrastructure. Patches to tie
> >>> SLEEF into LLVM's autovectorizer have been developed by Hal
> >>> Finkel as part of the bgclang project (which provides LLVM/Clang
> >>> ported to the IBM BG/Q supercomputer architecture). Hal has also
> >>> developed a user-facing header for the library, in the style of
> >>> Clang's intrinsics headers, which we can use as part of this
> >>> project. SLEEF has been used as part of bgclang in this way for
> >>> several years.
> >>>
> >>> The library currently supports several architectures:
> >>> * x86 - SSE2, FMA4, AVX, AVX2+FMA3
> >>> * ARM - NEON (single-precision only)
> >>> * A pure C (scalar) version
> >>> * Hal's version supports PowerPC/QPX.
> >>>
> >>> It is faily easy to port to other architectures. The library
> >>> provides similar functionality to Intel's Short Vector Math
> >>> Library (available with Intel's Compiler).
> >>>
> >>> Roadmap:
> >>> --------
> >>> 1) Get agreement on incorporating the library.
> >>> 2) Renaming the public interface to use only the
> >>>   implementation-reserved namespace (i.e. names starting with
> >>>   underscores), as is appropriate for a compiler runtime library.
> >>> 3) Convert the functions to use LLVM's naming conventions
> >>> (including, if
> >>>   desired, converting the source files to C++ allowing the use of
> >>>   function
> >>>   overloading).
> >>> 4) Create and document a public interface to the library.
> >>> 5) Add support for targeting the library to LLVM's
> >>> autovectorizer.
> >>> 6) Work with the community to port the library to other
> >>> architectures.
> >>>
> >>> Motivation:
> >>>
> >>> Recent CPUs and GPUs have vectorized FP multipliers and adders
> >>> for improving throughput of FP computation. In order to extract
> >>> the maximum computation power from processors with vectorized
> >>> ALUs, the software has to be vectorized to use SIMD data
> >>> structures. It is also preferred that conditional branches and
> >>> scatter/gather memory access are eliminated as much as possible.
> >>> However, rewriting existing software in this fashion is a very
> >>> hard and time consuming task that involves converting data
> >>> structures. Thus, realization of efficient libraries and
> >>> automatic vectorization is desired.
> >>>
> >>> In this proposal, we are going to incorporate a vectorized math
> >>> library, currently named SLEEF, into LLVM runtime library. By
> >>> doing this, elementary functions can be directly evaluated using
> >>> SIMD data types. We can also expect extra performance
> >>> improvements by allowing LLVM to automatically target the
> >>> functions (and inline them with LTO).
> >>>
> >>> Functionality of the library:
> >>>
> >>> For each elementary function, the library contains subroutines
> >>> for evaluation in single precision and double precision.
> >>> Different accuracy of the results can be chosen for a subset of
> >>> the elementary functions; for this subset there are versions
> >>> with up to 1 ulp error and versions with a few ulp error.
> >>> Obviously, less accurate versions are faster. Please note that
> >>> we have 0.5 ulp maximum error when we convert a real number into
> >>> a floating point number. In Hal's bgclang port, the less
> >>> accurate versions are used with -ffast-math, and the
> >>> more-accurate ones otherwise.
> >>>
> >>> For non-finite inputs and outputs, the library should return the
> >>> same results as libm. The library is tested if the evaluation
> >>> error is within the designed limit. The library is tested
> >>> against high-precision evaluation using the libmpfr library.
> >>> Especially, we rigorously checked the error of the trigonometric
> >>> functions when the arguments are close to an integral multiple
> >>> of PI/2.
> >>>
> >>> The size of the functions is very small.
> >>>
> >>> Implementation of the library:
> >>>
> >>> Basically, each function consists of reduction and kernel. For
> >>> the
> >>> kernel, a polynomial approximation is used. The coefficients are
> >>> carefully set to minimize the number of multiplications and
> >>> additions while reducing the error. The reduction is devised so
> >>> that the same kernel can be used for all range of the input
> >>> arguments. In order to improve the accuracy in the functions
> >>> with 1-ulp error, double-double calculations are used. Use of
> >>> fused multiply-add operations, which is quite common recently,
> >>> can further improve performance of these functions. Some of the
> >>> implementation techniques used in the library are explained in
> >>> [3].
> >>>
> >>> [1] https://github.com/shibatch/sleef [2]
> >>> https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
> >>> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
> >>>
> >>>
> >>> ********
> >>>
> >>> Regards,
> >>>
> >>> Naoki Shibata
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory