[Libclc-dev] [PATCH] math: Add fmod implementation

Wed Sep 10 12:37:39 PDT 2014

On Wed, Sep 10, 2014 at 1:35 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Wed, 2014-09-10 at 14:01 -0400, Matt Arsenault wrote:
>> On Sep 10, 2014, at 1:46 PM, Aaron Watry <awatry at gmail.com> wrote:
>>
>> > On Wed, Sep 10, 2014 at 12:17 PM, Matt Arsenault
>> > <Matthew.Arsenault at amd.com> wrote:
>> >> On 09/10/2014 11:59 AM, Aaron Watry wrote:
>> >>>
>> >>> Passes piglit tests on evergreen (sent to piglit list).
>> >>>
>> >>> Signed-off-by: Aaron Watry <awatry at gmail.com>
>> >>> ---
>> >>>  generic/include/clc/clc.h       |  1 +
>> >>>  generic/include/clc/math/fmod.h |  7 +++++++
>> >>>  generic/lib/SOURCES             |  1 +
>> >>>  generic/lib/math/fmod.cl        | 15 +++++++++++++++
>> >>>  4 files changed, 24 insertions(+)
>> >>>  create mode 100644 generic/include/clc/math/fmod.h
>> >>>  create mode 100644 generic/lib/math/fmod.cl
>> >>>
>> >>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>> >>> index b8c1cb9..94557a1 100644
>> >>> --- a/generic/include/clc/clc.h
>> >>> +++ b/generic/include/clc/clc.h
>> >>> @@ -47,6 +47,7 @@
>> >>>  #include <clc/math/fma.h>
>> >>>  #include <clc/math/fmax.h>
>> >>>  #include <clc/math/fmin.h>
>> >>> +#include <clc/math/fmod.h>
>> >>>  #include <clc/math/hypot.h>
>> >>>  #include <clc/math/log.h>
>> >>>  #include <clc/math/log2.h>
>> >>> diff --git a/generic/include/clc/math/fmod.h
>> >>> b/generic/include/clc/math/fmod.h
>> >>> new file mode 100644
>> >>> index 0000000..737679f
>> >>> --- /dev/null
>> >>> +++ b/generic/include/clc/math/fmod.h
>> >>> @@ -0,0 +1,7 @@
>> >>> +#define __CLC_BODY <clc/math/binary_decl.inc>
>> >>> +#define __CLC_FUNCTION fmod
>> >>> +
>> >>> +#include <clc/math/gentype.inc>
>> >>> +
>> >>> +#undef __CLC_BODY
>> >>> +#undef __CLC_FUNCTION
>> >>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>> >>> index e4ba1d1..45e12aa 100644
>> >>> --- a/generic/lib/SOURCES
>> >>> +++ b/generic/lib/SOURCES
>> >>> @@ -39,6 +39,7 @@ math/exp.cl
>> >>>  math/exp10.cl
>> >>>  math/fmax.cl
>> >>>  math/fmin.cl
>> >>> +math/fmod.cl
>> >>>  math/hypot.cl
>> >>>  math/mad.cl
>> >>>  math/mix.cl
>> >>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
>> >>> new file mode 100644
>> >>> index 0000000..091035b
>> >>> --- /dev/null
>> >>> +++ b/generic/lib/math/fmod.cl
>> >>> @@ -0,0 +1,15 @@
>> >>> +#include <clc/clc.h>
>> >>> +
>> >>> +#ifdef cl_khr_fp64
>> >>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>> >>> +#endif
>> >>> +
>> >>> +#define FUNCTION fmod
>> >>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
>> >>> +
>> >>> +#define __CLC_BODY <binary_impl.inc>
>> >>> +#include <clc/math/gentype.inc>
>> >>> +
>> >>> +#undef __CLC_BODY
>> >>> +#undef FUNCTION
>> >>> +#undef FUNCTION_IMPL
>> >>> \ No newline at end of file
>> >>
>> >>
>> >> I think this can use the LLVM frem instruction instead, and would be better
>> >> expanded in the backend. I have most of a patch that expands ISD::FREM for
>> >> SI that I forgot about somewhere
>> >>
>> >
>> > Hi Matt,
>> >
>> > There's both fmod and remainder functions in the CL built-in library,
>> > and as near as I can tell, they just differ in how to treat the result
>> > of x/y:
>> >
>> > From the CL 1.2 spec (6.12.12):
>> > gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).
>> >
>> > gentype remainder (gentype x, gentype y) => Compute the value r such
>> > that r = x - n*y, where n
>> > is the integer nearest the exact value of x/y. If there
>> > are two integers closest to x/y, n shall be the even
>> > one. If r is zero, it is given the same sign as x.
>> >
>> > Do you happen to know which behavior the frem instruction gives us?
>> > Truncate or Round half to nearest even?  I'm guessing that one of
>> > these will be able to use the frem instruction, and the other won't,
>> > but I haven't checked which is which yet.
>
> There is both __builtin_fmod(f), and __builtin_remainder(f), but I
> haven't found any documentation on them, or code outside of
> Basic/Builtins.def

That's because these are libm functions(same thing with
__builtin_[sin|cos|tan|etc] and many of the other trig functions), and
there is no libm implementation that exists for R600... so  calling
__builtin_modf just leads to invalid function calls and a segfault...

It's all well and good to have a built-in function in clang for most
of the math functions, but if the function isn't really built-in and
is dependent upon an external architecture-specific library, then we
either need to:

1) find a way to port the libm functions to CL C (which is potentially
difficult... most of the float-precision functions assume that the
device can at least support doubles at lower performance)

2) create an R600 implementation of libm

3) re-write the functions ourselves.

I've been trying to stick to CLC implementations of the functions
where possible to keep the implementation as architecture neutral as I
can (and to keep the libclc library as self-contained as possible).

I can refactor this to attempt to use __builtin_fmod on architectures
where this function is expected to be available with either a CLC or
bitcode override for R600, or if the frem instruction matches the
required behavior, I'll just use that for all architectures (if I
can't use it here, then maybe we can use it for remainder)... It's
just a bit of extra work.

Sorry for the rant, but this section of the built-in library has been
causing much more trouble than I really cared to take on (my trig is
pretty weak and I'm not the hugest fan of floating point
operations)... but at the same time, it's the primary roadblock
standing in the way of getting cppamp-driver-ng working on the OSS
radeon drivers (which is my current non-work focus).

If anyone cares, here's the list of functions that are still needed to
get cppamp-driver-ng working on radeonsi/r600 for which I haven't sent
patches to the list:
acosh, asinh, atanh, cbrt, cosh, cospi, erf, erfc, expm1, fdim, frexp,
ilogb, ldexp, log10, log1p (tom sent this yesterday), logb, modf,
remainder, sinh, sinpi, tanh, and tgamma

After that, we just need to enable the clang storage class specifiers
extension to get support for the static keyword in clover and there's
a change that things will work...  but that's a lot of math functions
left if I can't just use trig identities to get an implementation in
place before we optimize it.

--Aaron

>
> If they are based on math.h then both fmod and remainder seem to match
> OCL definitions.
>
> we have round to nearest even instructions, not sure if using __builtin
> or adding amdgpu.rndne intrinsic is the better way to go.
>
> jan
>
>> >
>> > —Aaron
>>
>> I’m not sure. I was operating under the assumption that frem matches
>> food’s behavior, but I haven’t tested it particularly carefully. x86
>> lowers frem into calls to fmod, and I assume the OpenCL version behaves
>> the same as libm's
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at pcc.me.uk
>> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
>
> --
> Jan Vesely <jan.vesely at rutgers.edu>