[LLVMdev] Math Library Intrinsics as native intrinsics

Tue Apr 14 14:25:00 PDT 2009

On Apr 14, 2009, at 12:45 PM, Villmow, Micah wrote:

> Fair enough,
> The current issue that I am having with my backend and the language I
> have to support via LLVM is that:
> 1) I need to support a large number of math functions as language
> built-in and not as a separate library. These functions are part of  
> the
> core language and thus must be supported on a wide-variety of data  
> types
> with very specific rules and definitions for each function, which in
> some cases differ to the definition that llvm gives to the same  
> function
> name. There are 165 math/integer/relational/geometric specific  
> functions
> in section 6.11 of the OpenCL spec, http://www.khronos.org/registry/ 
> cl/
> when counting for signed/unsigned/floating point variants for some
> functions.

Having worked on an OpenCL implementation myself, there is no  
requirement for these functions to be part of LLVM in order for you to  
call them.  The same argument could be logically extrapolated to any  
library in any language; someone could argue they needed to be part of  
LLVM.  That argument doesn't hold any water.

> 2) AMD needs to support these on both GPU and CPU backends so pushing
> them to a uniform section is highly desired so we don't have to
> duplicate work. Some of these functions are native instructions on the
> GPU in either scalar or vector formats but not on the CPU, or vice
> versa.

I don't understand how you plan on avoiding work here;  If they're in  
the compiler, the compiler is going to have to know how to generate  
code for the various Libm routines that the GPU and CPU don't natively  
implement, which is most of them, so you're just pushing some  
particular implementation of libm into the compiler.  I don't see how  
this is beneficial, or saving you work, since the implementation will  
not be the same.

> 3) The OpenCL language requires scalar and vector versions up to 16
> elements for 8/16/32 bit data types and 8 elements for 64bit data  
> types.
> Implementing all of these combinations is an immense amount of work  
> and
> this is greatly simplified by utilizing the Legalize/Combine
> infrastructure already in place to reduce all the vector types to the
> scalar versions.

This is only relevant if you believe that every OpenCL function has to  
be represented by a first-class intrinsic node in the LLVM IR.  I see  
no evidence that this is the case;  Indeed, since different platforms  
have different requirements for libm functions with respect to  
rounding and errno, I don't see why the OpenCL set should get special  
treatment and be enshrined into LLVM IR proper, and seems at odds with  
the IR's current design goal of having a relatively small number of  
simple instructions.

> 4) GPU's do not have real support for loading of libraries, so  
> expanding
> to a library function approach would not be feasible and this approach
> looses the flexibility of the Legalize/Combine infrastructure which as
> mentioned earlier is highly desired.

Whether you can actually dynamically load a code segment is not  
relevant to the usage of llvm bitcode files as libraries.   
SimplifyLibCalls can already hack on "known" functions, so that's  
covered.  As for Legalize, it seems questionable to me that Legalize  
should contain all the code necessary to produce a fully legal libm  
implementation for every target over a variety of vector widths,  
considering that "vector libm" isn't even something that exists  
outside of OpenCL.

> Some of the benefits of doing this would be that LLVM would then have
> the beginnings of a large built-in reference math library based on,  
> but
> not limited to, the OpenCL 1.0 spec. This would allow AMD and possibly
> other vendors to utilize this work on various backends without  
> having to
> duplicate work. This is work that I am doing internally at AMD  
> anyways,
> so for LLVM it will hopefully require minimal work.

It would seem to me that if someone was interested in delivering a  
portable libm, they could do some through an LLVM IR bitcode file,  
rather than building the implementation into the compiler itself.   
This would also be platform agnostic, and a heck of a lot easier to  
maintain than many thousands of lines of C++ which generates that  
bitcode file.

> Hope this helps clear up the problem I am approaching. This solution
> does not remove the ability of using a math library as the functions  
> can
> always be expanded to a function call, but allows usage of LLVM
> infrastructure with the math library more easily.

I don't see what value you're adding here aside from essentially hard  
coding a particular libm implementation into the CodeGenerator, not  
even LLVM proper.  For targets with optimized libms, expanding to a  
function call is almost always the right idea, when it isn't  
SimplifyLibCalls can pick up the slack, and if your platform doesn't  
have an optimized libm, a portable one in IR or C that optimize as you  
have time seems like a far more sane approach.

Nate