<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Sep 10, 2014, at 3:37 PM, Aaron Watry <<a href="mailto:awatry@gmail.com">awatry@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Wed, Sep 10, 2014 at 1:35 PM, Jan Vesely <<a href="mailto:jan.vesely@rutgers.edu">jan.vesely@rutgers.edu</a>> wrote:<br><blockquote type="cite">On Wed, 2014-09-10 at 14:01 -0400, Matt Arsenault wrote:<br><blockquote type="cite">On Sep 10, 2014, at 1:46 PM, Aaron Watry <<a href="mailto:awatry@gmail.com">awatry@gmail.com</a>> wrote:<br><br><blockquote type="cite">On Wed, Sep 10, 2014 at 12:17 PM, Matt Arsenault<br><<a href="mailto:Matthew.Arsenault@amd.com">Matthew.Arsenault@amd.com</a>> wrote:<br><blockquote type="cite">On 09/10/2014 11:59 AM, Aaron Watry wrote:<br><blockquote type="cite"><br>Passes piglit tests on evergreen (sent to piglit list).<br><br>Signed-off-by: Aaron Watry <<a href="mailto:awatry@gmail.com">awatry@gmail.com</a>><br>---<br>generic/include/clc/clc.h | 1 +<br>generic/include/clc/math/fmod.h | 7 +++++++<br>generic/lib/SOURCES | 1 +<br>generic/lib/math/fmod.cl | 15 +++++++++++++++<br>4 files changed, 24 insertions(+)<br>create mode 100644 generic/include/clc/math/fmod.h<br>create mode 100644 generic/lib/math/fmod.cl<br><br>diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h<br>index b8c1cb9..94557a1 100644<br>--- a/generic/include/clc/clc.h<br>+++ b/generic/include/clc/clc.h<br>@@ -47,6 +47,7 @@<br>#include <clc/math/fma.h><br>#include <clc/math/fmax.h><br>#include <clc/math/fmin.h><br>+#include <clc/math/fmod.h><br>#include <clc/math/hypot.h><br>#include <clc/math/log.h><br>#include <clc/math/log2.h><br>diff --git a/generic/include/clc/math/fmod.h<br>b/generic/include/clc/math/fmod.h<br>new file mode 100644<br>index 0000000..737679f<br>--- /dev/null<br>+++ b/generic/include/clc/math/fmod.h<br>@@ -0,0 +1,7 @@<br>+#define __CLC_BODY <clc/math/binary_decl.inc><br>+#define __CLC_FUNCTION fmod<br>+<br>+#include <clc/math/gentype.inc><br>+<br>+#undef __CLC_BODY<br>+#undef __CLC_FUNCTION<br>diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES<br>index e4ba1d1..45e12aa 100644<br>--- a/generic/lib/SOURCES<br>+++ b/generic/lib/SOURCES<br>@@ -39,6 +39,7 @@ math/exp.cl<br>math/exp10.cl<br>math/fmax.cl<br>math/fmin.cl<br>+math/fmod.cl<br>math/hypot.cl<br>math/mad.cl<br>math/mix.cl<br>diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl<br>new file mode 100644<br>index 0000000..091035b<br>--- /dev/null<br>+++ b/generic/lib/math/fmod.cl<br>@@ -0,0 +1,15 @@<br>+#include <clc/clc.h><br>+<br>+#ifdef cl_khr_fp64<br>+#pragma OPENCL EXTENSION cl_khr_fp64 : enable<br>+#endif<br>+<br>+#define FUNCTION fmod<br>+#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))<br>+<br>+#define __CLC_BODY <binary_impl.inc><br>+#include <clc/math/gentype.inc><br>+<br>+#undef __CLC_BODY<br>+#undef FUNCTION<br>+#undef FUNCTION_IMPL<br>\ No newline at end of file<br></blockquote><br><br>I think this can use the LLVM frem instruction instead, and would be better<br>expanded in the backend. I have most of a patch that expands ISD::FREM for<br>SI that I forgot about somewhere<br><br></blockquote><br>Hi Matt,<br><br>There's both fmod and remainder functions in the CL built-in library,<br>and as near as I can tell, they just differ in how to treat the result<br>of x/y:<br><br>From the CL 1.2 spec (6.12.12):<br>gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).<br><br>gentype remainder (gentype x, gentype y) => Compute the value r such<br>that r = x - n*y, where n<br>is the integer nearest the exact value of x/y. If there<br>are two integers closest to x/y, n shall be the even<br>one. If r is zero, it is given the same sign as x.<br><br>Do you happen to know which behavior the frem instruction gives us?<br>Truncate or Round half to nearest even? I'm guessing that one of<br>these will be able to use the frem instruction, and the other won't,<br>but I haven't checked which is which yet.<br></blockquote></blockquote><br>There is both __builtin_fmod(f), and __builtin_remainder(f), but I<br>haven't found any documentation on them, or code outside of<br>Basic/Builtins.def<br></blockquote><br>That's because these are libm functions(same thing with<br>__builtin_[sin|cos|tan|etc] and many of the other trig functions), and<br>there is no libm implementation that exists for R600... so calling<br>__builtin_modf just leads to invalid function calls and a segfault...<br><br>It's all well and good to have a built-in function in clang for most<br>of the math functions, but if the function isn't really built-in and<br>is dependent upon an external architecture-specific library, then we<br>either need to:<br><br>1) find a way to port the libm functions to CL C (which is potentially<br>difficult... most of the float-precision functions assume that the<br>device can at least support doubles at lower performance)<br><br>2) create an R600 implementation of libm<br><br>3) re-write the functions ourselves.<br><br>I've been trying to stick to CLC implementations of the functions<br>where possible to keep the implementation as architecture neutral as I<br>can (and to keep the libclc library as self-contained as possible).<br><br>I can refactor this to attempt to use __builtin_fmod on architectures<br>where this function is expected to be available with either a CLC or<br>bitcode override for R600, or if the frem instruction matches the<br>required behavior, I'll just use that for all architectures (if I<br>can't use it here, then maybe we can use it for remainder)... It's<br>just a bit of extra work.<br></div></blockquote><div><br></div><div>I’ll try to post my patch implementing frem for R600 later today, although I don’t have access to hardware right now to test it on</div><div><br></div><br><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br>Sorry for the rant, but this section of the built-in library has been<br>causing much more trouble than I really cared to take on (my trig is<br>pretty weak and I'm not the hugest fan of floating point<br>operations)... but at the same time, it's the primary roadblock<br>standing in the way of getting cppamp-driver-ng working on the OSS<br>radeon drivers (which is my current non-work focus).<br><br>If anyone cares, here's the list of functions that are still needed to<br>get cppamp-driver-ng working on radeonsi/r600 for which I haven't sent<br>patches to the list:<br>acosh, asinh, atanh, cbrt, cosh, cospi, erf, erfc, expm1, fdim, frexp,<br>ilogb, ldexp, log10, log1p (tom sent this yesterday), logb, modf,<br>remainder, sinh, sinpi, tanh, and tgamma<br><br>After that, we just need to enable the clang storage class specifiers<br>extension to get support for the static keyword in clover and there's<br>a change that things will work... but that's a lot of math functions<br>left if I can't just use trig identities to get an implementation in<br>place before we optimize it.<br><br>--Aaron<br><br><br><blockquote type="cite"><br>If they are based on math.h then both fmod and remainder seem to match<br>OCL definitions.<br><br>we have round to nearest even instructions, not sure if using __builtin<br>or adding amdgpu.rndne intrinsic is the better way to go.<br><br>jan<br><br><blockquote type="cite"><blockquote type="cite"><br>—Aaron<br></blockquote><br>I’m not sure. I was operating under the assumption that frem matches<br>food’s behavior, but I haven’t tested it particularly carefully. x86<br>lowers frem into calls to fmod, and I assume the OpenCL version behaves<br>the same as libm's<br><br>_______________________________________________<br>Libclc-dev mailing list<br><a href="mailto:Libclc-dev@pcc.me.uk">Libclc-dev@pcc.me.uk</a><br>http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev<br></blockquote><br>--<br>Jan Vesely <<a href="mailto:jan.vesely@rutgers.edu">jan.vesely@rutgers.edu</a>></blockquote></div></blockquote></div><br></body></html>