[Libclc-dev] [PATCH 0/2] modf math builtin

Tue Jan 19 02:31:07 PST 2016

Matt Arsenault píše v Po 18. 01. 2016 v 15:41 -0800:
> On 01/18/2016 01:36 AM, Pavel Ondračka via Libclc-dev wrote:
> > Attached is the implementation of modf math builtin copied from AMD
> > builtin library.
> > 
> > This is my first patch ever so please be patient with review. My
> > main
> > motivation was to get einstein at home binary pulsar search app
> > working.
> > With this patch series the kernels build succesfully however the
> > result
> > are wrong, there are probably some other problems (or I messed up).
> > 
> > I've done some casual testing and at least the float part seems to
> > be
> > working properly, the fp64 part is totally untested.
> > 
> > Will piglit tests be needed to get this accepted? I had a look at
> > the
> > gen_cl_math_builtins.py piglit script, however it seems I would
> > have to
> > modify the framework to be able to test functions that have two
> > outputs.
> > Sadly my python skills are rudimentary.
> For these kinds of patches I think it's best if somebody just runs
> the 
> opencl conformance tests for you to verify them.
> 
> 
> > 
> > 
> > Pavel Ondračka (2):
> >    Add _CLC_V_V_VP_VECTORIZE macro
> >    Implement modf builtin
> > 
> >   generic/include/clc/clc.h         |  1 +
> >   generic/include/clc/math/modf.h   | 24 ++++++++++
> >   generic/include/clc/math/modf.inc | 25 +++++++++++
> >   generic/lib/SOURCES               |  1 +
> >   generic/lib/clcmacro.h            | 22 ++++++++++
> >   generic/lib/math/modf.cl          | 92
> > +++++++++++++++++++++++++++++++++++++++
> >   6 files changed, 165 insertions(+)
> >   create mode 100644 generic/include/clc/math/modf.h
> >   create mode 100644 generic/include/clc/math/modf.inc
> >   create mode 100644 generic/lib/math/modf.cl
> > 
> 
> 
> Does the pseudocode in the OpenCL documentation implementation for
> this 
> function work?
> 
>         gentype modf ( gentype value, gentype *iptr )
>            {
>                      *iptr = trunc( value );
>                      return copysign( isinf( value ) ? 0.0 : value - 
> *iptr, value );
>            }
> 
> I would expect this to be faster on CI+ for fp64 due to the native
> trunc 
> instruction, although this implementation is probably better for 
> hardware without it. The start of this looks a bit like an inlined
> ftrunc
> 
I think the example implementation should work as well. I chosed the
implementation from AMD builtins library because I thought it might be
faster, however if that is not the case (and I have no idea about how
the hardware works), than the example implementation is probably the
right way to go. At least it would simplify the code quite a lot.
So what would be the preffered way?