[Libclc-dev] [PATCH] math: Add fmod implementation

Wed Sep 10 13:05:36 PDT 2014

On Sep 10, 2014, at 3:37 PM, Aaron Watry <awatry at gmail.com> wrote:

> On Wed, Sep 10, 2014 at 1:35 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> On Wed, 2014-09-10 at 14:01 -0400, Matt Arsenault wrote:
>>> On Sep 10, 2014, at 1:46 PM, Aaron Watry <awatry at gmail.com> wrote:
>>> 
>>>> On Wed, Sep 10, 2014 at 12:17 PM, Matt Arsenault
>>>> <Matthew.Arsenault at amd.com> wrote:
>>>>> On 09/10/2014 11:59 AM, Aaron Watry wrote:
>>>>>> 
>>>>>> Passes piglit tests on evergreen (sent to piglit list).
>>>>>> 
>>>>>> Signed-off-by: Aaron Watry <awatry at gmail.com>
>>>>>> ---
>>>>>> generic/include/clc/clc.h       |  1 +
>>>>>> generic/include/clc/math/fmod.h |  7 +++++++
>>>>>> generic/lib/SOURCES             |  1 +
>>>>>> generic/lib/math/fmod.cl        | 15 +++++++++++++++
>>>>>> 4 files changed, 24 insertions(+)
>>>>>> create mode 100644 generic/include/clc/math/fmod.h
>>>>>> create mode 100644 generic/lib/math/fmod.cl
>>>>>> 
>>>>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>>>>> index b8c1cb9..94557a1 100644
>>>>>> --- a/generic/include/clc/clc.h
>>>>>> +++ b/generic/include/clc/clc.h
>>>>>> @@ -47,6 +47,7 @@
>>>>>> #include <clc/math/fma.h>
>>>>>> #include <clc/math/fmax.h>
>>>>>> #include <clc/math/fmin.h>
>>>>>> +#include <clc/math/fmod.h>
>>>>>> #include <clc/math/hypot.h>
>>>>>> #include <clc/math/log.h>
>>>>>> #include <clc/math/log2.h>
>>>>>> diff --git a/generic/include/clc/math/fmod.h
>>>>>> b/generic/include/clc/math/fmod.h
>>>>>> new file mode 100644
>>>>>> index 0000000..737679f
>>>>>> --- /dev/null
>>>>>> +++ b/generic/include/clc/math/fmod.h
>>>>>> @@ -0,0 +1,7 @@
>>>>>> +#define __CLC_BODY <clc/math/binary_decl.inc>
>>>>>> +#define __CLC_FUNCTION fmod
>>>>>> +
>>>>>> +#include <clc/math/gentype.inc>
>>>>>> +
>>>>>> +#undef __CLC_BODY
>>>>>> +#undef __CLC_FUNCTION
>>>>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>>>>> index e4ba1d1..45e12aa 100644
>>>>>> --- a/generic/lib/SOURCES
>>>>>> +++ b/generic/lib/SOURCES
>>>>>> @@ -39,6 +39,7 @@ math/exp.cl
>>>>>> math/exp10.cl
>>>>>> math/fmax.cl
>>>>>> math/fmin.cl
>>>>>> +math/fmod.cl
>>>>>> math/hypot.cl
>>>>>> math/mad.cl
>>>>>> math/mix.cl
>>>>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
>>>>>> new file mode 100644
>>>>>> index 0000000..091035b
>>>>>> --- /dev/null
>>>>>> +++ b/generic/lib/math/fmod.cl
>>>>>> @@ -0,0 +1,15 @@
>>>>>> +#include <clc/clc.h>
>>>>>> +
>>>>>> +#ifdef cl_khr_fp64
>>>>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>>>>>> +#endif
>>>>>> +
>>>>>> +#define FUNCTION fmod
>>>>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
>>>>>> +
>>>>>> +#define __CLC_BODY <binary_impl.inc>
>>>>>> +#include <clc/math/gentype.inc>
>>>>>> +
>>>>>> +#undef __CLC_BODY
>>>>>> +#undef FUNCTION
>>>>>> +#undef FUNCTION_IMPL
>>>>>> \ No newline at end of file
>>>>> 
>>>>> 
>>>>> I think this can use the LLVM frem instruction instead, and would be better
>>>>> expanded in the backend. I have most of a patch that expands ISD::FREM for
>>>>> SI that I forgot about somewhere
>>>>> 
>>>> 
>>>> Hi Matt,
>>>> 
>>>> There's both fmod and remainder functions in the CL built-in library,
>>>> and as near as I can tell, they just differ in how to treat the result
>>>> of x/y:
>>>> 
>>>> From the CL 1.2 spec (6.12.12):
>>>> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).
>>>> 
>>>> gentype remainder (gentype x, gentype y) => Compute the value r such
>>>> that r = x - n*y, where n
>>>> is the integer nearest the exact value of x/y. If there
>>>> are two integers closest to x/y, n shall be the even
>>>> one. If r is zero, it is given the same sign as x.
>>>> 
>>>> Do you happen to know which behavior the frem instruction gives us?
>>>> Truncate or Round half to nearest even?  I'm guessing that one of
>>>> these will be able to use the frem instruction, and the other won't,
>>>> but I haven't checked which is which yet.
>> 
>> There is both __builtin_fmod(f), and __builtin_remainder(f), but I
>> haven't found any documentation on them, or code outside of
>> Basic/Builtins.def
> 
> That's because these are libm functions(same thing with
> __builtin_[sin|cos|tan|etc] and many of the other trig functions), and
> there is no libm implementation that exists for R600... so  calling
> __builtin_modf just leads to invalid function calls and a segfault...
> 
> It's all well and good to have a built-in function in clang for most
> of the math functions, but if the function isn't really built-in and
> is dependent upon an external architecture-specific library, then we
> either need to:
> 
> 1) find a way to port the libm functions to CL C (which is potentially
> difficult... most of the float-precision functions assume that the
> device can at least support doubles at lower performance)
> 
> 2) create an R600 implementation of libm
> 
> 3) re-write the functions ourselves.
> 
> I've been trying to stick to CLC implementations of the functions
> where possible to keep the implementation as architecture neutral as I
> can (and to keep the libclc library as self-contained as possible).
> 
> I can refactor this to attempt to use __builtin_fmod on architectures
> where this function is expected to be available with either a CLC or
> bitcode override for R600, or if the frem instruction matches the
> required behavior, I'll just use that for all architectures (if I
> can't use it here, then maybe we can use it for remainder)... It's
> just a bit of extra work.

I’ll try to post my patch implementing frem for R600 later today, although I don’t have access to hardware right now to test it on

> 
> Sorry for the rant, but this section of the built-in library has been
> causing much more trouble than I really cared to take on (my trig is
> pretty weak and I'm not the hugest fan of floating point
> operations)... but at the same time, it's the primary roadblock
> standing in the way of getting cppamp-driver-ng working on the OSS
> radeon drivers (which is my current non-work focus).
> 
> If anyone cares, here's the list of functions that are still needed to
> get cppamp-driver-ng working on radeonsi/r600 for which I haven't sent
> patches to the list:
> acosh, asinh, atanh, cbrt, cosh, cospi, erf, erfc, expm1, fdim, frexp,
> ilogb, ldexp, log10, log1p (tom sent this yesterday), logb, modf,
> remainder, sinh, sinpi, tanh, and tgamma
> 
> After that, we just need to enable the clang storage class specifiers
> extension to get support for the static keyword in clover and there's
> a change that things will work...  but that's a lot of math functions
> left if I can't just use trig identities to get an implementation in
> place before we optimize it.
> 
> --Aaron
> 
> 
>> 
>> If they are based on math.h then both fmod and remainder seem to match
>> OCL definitions.
>> 
>> we have round to nearest even instructions, not sure if using __builtin
>> or adding amdgpu.rndne intrinsic is the better way to go.
>> 
>> jan
>> 
>>>> 
>>>> —Aaron
>>> 
>>> I’m not sure. I was operating under the assumption that frem matches
>>> food’s behavior, but I haven’t tested it particularly carefully. x86
>>> lowers frem into calls to fmod, and I assume the OpenCL version behaves
>>> the same as libm's
>>> 
>>> _______________________________________________
>>> Libclc-dev mailing list
>>> Libclc-dev at pcc.me.uk
>>> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
>> 
>> --
>> Jan Vesely <jan.vesely at rutgers.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140910/22b2c631/attachment.html>