[Libclc-dev] [PATCH 0/2] modf math builtin

Mon Jan 18 15:41:03 PST 2016

On 01/18/2016 01:36 AM, Pavel Ondračka via Libclc-dev wrote:
> Attached is the implementation of modf math builtin copied from AMD
> builtin library.
>
> This is my first patch ever so please be patient with review. My main
> motivation was to get einstein at home binary pulsar search app working.
> With this patch series the kernels build succesfully however the result
> are wrong, there are probably some other problems (or I messed up).
>
> I've done some casual testing and at least the float part seems to be
> working properly, the fp64 part is totally untested.
>
> Will piglit tests be needed to get this accepted? I had a look at the
> gen_cl_math_builtins.py piglit script, however it seems I would have to
> modify the framework to be able to test functions that have two outputs.
> Sadly my python skills are rudimentary.
For these kinds of patches I think it's best if somebody just runs the 
opencl conformance tests for you to verify them.

>
>
> Pavel Ondračka (2):
>    Add _CLC_V_V_VP_VECTORIZE macro
>    Implement modf builtin
>
>   generic/include/clc/clc.h         |  1 +
>   generic/include/clc/math/modf.h   | 24 ++++++++++
>   generic/include/clc/math/modf.inc | 25 +++++++++++
>   generic/lib/SOURCES               |  1 +
>   generic/lib/clcmacro.h            | 22 ++++++++++
>   generic/lib/math/modf.cl          | 92 +++++++++++++++++++++++++++++++++++++++
>   6 files changed, 165 insertions(+)
>   create mode 100644 generic/include/clc/math/modf.h
>   create mode 100644 generic/include/clc/math/modf.inc
>   create mode 100644 generic/lib/math/modf.cl
>

Does the pseudocode in the OpenCL documentation implementation for this 
function work?

        gentype modf ( gentype value, gentype *iptr )
           {
                     *iptr = trunc( value );
                     return copysign( isinf( value ) ? 0.0 : value - 
*iptr, value );
           }

I would expect this to be faster on CI+ for fp64 due to the native trunc 
instruction, although this implementation is probably better for 
hardware without it. The start of this looks a bit like an inlined ftrunc