[Libclc-dev] [PATCH] math: Add fmod implementation
Matt Arsenault
arsenm2 at gmail.com
Wed Sep 10 15:11:45 PDT 2014
On Sep 10, 2014, at 4:05 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:
> On Sep 10, 2014, at 3:37 PM, Aaron Watry <awatry at gmail.com> wrote:
>> On Wed, Sep 10, 2014 at 1:35 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>>> On Wed, 2014-09-10 at 14:01 -0400, Matt Arsenault wrote:
>>>> On Sep 10, 2014, at 1:46 PM, Aaron Watry <awatry at gmail.com> wrote:
>>>>> On Wed, Sep 10, 2014 at 12:17 PM, Matt Arsenault
>>>>> <Matthew.Arsenault at amd.com> wrote:
>>>>>> On 09/10/2014 11:59 AM, Aaron Watry wrote:
>>>>>>> Passes piglit tests on evergreen (sent to piglit list).
>>>>>>> Signed-off-by: Aaron Watry <awatry at gmail.com>
>>>>>>> generic/include/clc/clc.h | 1 +
>>>>>>> generic/include/clc/math/fmod.h | 7 +++++++
>>>>>>> generic/lib/SOURCES | 1 +
>>>>>>> generic/lib/math/fmod.cl | 15 +++++++++++++++
>>>>>>> 4 files changed, 24 insertions(+)
>>>>>>> create mode 100644 generic/include/clc/math/fmod.h
>>>>>>> create mode 100644 generic/lib/math/fmod.cl
>>>>>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>>>>>> index b8c1cb9..94557a1 100644
>>>>>>> --- a/generic/include/clc/clc.h
>>>>>>> +++ b/generic/include/clc/clc.h
>>>>>>> @@ -47,6 +47,7 @@
>>>>>>> #include <clc/math/fma.h>
>>>>>>> #include <clc/math/fmax.h>
>>>>>>> #include <clc/math/fmin.h>
>>>>>>> +#include <clc/math/fmod.h>
>>>>>>> #include <clc/math/hypot.h>
>>>>>>> #include <clc/math/log.h>
>>>>>>> #include <clc/math/log2.h>
>>>>>>> diff --git a/generic/include/clc/math/fmod.h
>>>>>>> b/generic/include/clc/math/fmod.h
>>>>>>> new file mode 100644
>>>>>>> index 0000000..737679f
>>>>>>> --- /dev/null
>>>>>>> +++ b/generic/include/clc/math/fmod.h
>>>>>>> @@ -0,0 +1,7 @@
>>>>>>> +#define __CLC_BODY <clc/math/binary_decl.inc>
>>>>>>> +#define __CLC_FUNCTION fmod
>>>>>>> +
>>>>>>> +#include <clc/math/gentype.inc>
>>>>>>> +
>>>>>>> +#undef __CLC_BODY
>>>>>>> +#undef __CLC_FUNCTION
>>>>>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>>>>>> index e4ba1d1..45e12aa 100644
>>>>>>> --- a/generic/lib/SOURCES
>>>>>>> +++ b/generic/lib/SOURCES
>>>>>>> @@ -39,6 +39,7 @@ math/exp.cl
>>>>>>> math/exp10.cl
>>>>>>> math/fmax.cl
>>>>>>> math/fmin.cl
>>>>>>> +math/fmod.cl
>>>>>>> math/hypot.cl
>>>>>>> math/mad.cl
>>>>>>> math/mix.cl
>>>>>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
>>>>>>> new file mode 100644
>>>>>>> index 0000000..091035b
>>>>>>> --- /dev/null
>>>>>>> +++ b/generic/lib/math/fmod.cl
>>>>>>> @@ -0,0 +1,15 @@
>>>>>>> +#include <clc/clc.h>
>>>>>>> +
>>>>>>> +#ifdef cl_khr_fp64
>>>>>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#define FUNCTION fmod
>>>>>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
>>>>>>> +
>>>>>>> +#define __CLC_BODY <binary_impl.inc>
>>>>>>> +#include <clc/math/gentype.inc>
>>>>>>> +
>>>>>>> +#undef __CLC_BODY
>>>>>>> +#undef FUNCTION
>>>>>>> +#undef FUNCTION_IMPL
>>>>>>> \ No newline at end of file
>>>>>> I think this can use the LLVM frem instruction instead, and would be better
>>>>>> expanded in the backend. I have most of a patch that expands ISD::FREM for
>>>>>> SI that I forgot about somewhere
>>>>>>
>>>>> Hi Matt,
>>>>>
>>>>> There's both fmod and remainder functions in the CL built-in library,
>>>>> and as near as I can tell, they just differ in how to treat the result
>>>>> of x/y:
>>>>>
>>>>> From the CL 1.2 spec (6.12.12):
>>>>> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).
>>>>>
>>>>> gentype remainder (gentype x, gentype y) => Compute the value r such
>>>>> that r = x - n*y, where n
>>>>> is the integer nearest the exact value of x/y. If there
>>>>> are two integers closest to x/y, n shall be the even
>>>>> one. If r is zero, it is given the same sign as x.
>>>>>
>>>>> Do you happen to know which behavior the frem instruction gives us?
>>>>> Truncate or Round half to nearest even? I'm guessing that one of
>>>>> these will be able to use the frem instruction, and the other won't,
>>>>> but I haven't checked which is which yet.
>>> There is both __builtin_fmod(f), and __builtin_remainder(f), but I
>>> haven't found any documentation on them, or code outside of
>>> Basic/Builtins.def
>>
>> That's because these are libm functions(same thing with
>> __builtin_[sin|cos|tan|etc] and many of the other trig functions), and
>> there is no libm implementation that exists for R600... so calling
>> __builtin_modf just leads to invalid function calls and a segfault...
>>
>> It's all well and good to have a built-in function in clang for most
>> of the math functions, but if the function isn't really built-in and
>> is dependent upon an external architecture-specific library, then we
>> either need to:
>>
>> 1) find a way to port the libm functions to CL C (which is potentially
>> difficult... most of the float-precision functions assume that the
>> device can at least support doubles at lower performance)
>>
>> 2) create an R600 implementation of libm
>>
>> 3) re-write the functions ourselves.
>>
>> I've been trying to stick to CLC implementations of the functions
>> where possible to keep the implementation as architecture neutral as I
>> can (and to keep the libclc library as self-contained as possible).
>>
>> I can refactor this to attempt to use __builtin_fmod on architectures
>> where this function is expected to be available with either a CLC or
>> bitcode override for R600, or if the frem instruction matches the
>> required behavior, I'll just use that for all architectures (if I
>> can't use it here, then maybe we can use it for remainder)... It's
>> just a bit of extra work.
> I’ll try to post my patch implementing frem for R600 later today, although I don’t have access to hardware right now to test it on
>
I’ve pushed the R600 implementation of frem, so you should be able to see if it works now. The double version should fail any tests since double fdiv is implemented incorrectly until some operand legalization bugs are fixed
