[Libclc-dev] [PATCH 0/2] modf math builtin
Pavel Ondračka via Libclc-dev
libclc-dev at lists.llvm.org
Tue Jan 19 02:31:07 PST 2016
Matt Arsenault píše v Po 18. 01. 2016 v 15:41 -0800:
> On 01/18/2016 01:36 AM, Pavel Ondračka via Libclc-dev wrote:
> > Attached is the implementation of modf math builtin copied from AMD
> > builtin library.
> >
> > This is my first patch ever so please be patient with review. My
> > main
> > motivation was to get einstein at home binary pulsar search app
> > working.
> > With this patch series the kernels build succesfully however the
> > result
> > are wrong, there are probably some other problems (or I messed up).
> >
> > I've done some casual testing and at least the float part seems to
> > be
> > working properly, the fp64 part is totally untested.
> >
> > Will piglit tests be needed to get this accepted? I had a look at
> > the
> > gen_cl_math_builtins.py piglit script, however it seems I would
> > have to
> > modify the framework to be able to test functions that have two
> > outputs.
> > Sadly my python skills are rudimentary.
> For these kinds of patches I think it's best if somebody just runs
> the
> opencl conformance tests for you to verify them.
>
>
> >
> >
> > Pavel Ondračka (2):
> > Add _CLC_V_V_VP_VECTORIZE macro
> > Implement modf builtin
> >
> > generic/include/clc/clc.h | 1 +
> > generic/include/clc/math/modf.h | 24 ++++++++++
> > generic/include/clc/math/modf.inc | 25 +++++++++++
> > generic/lib/SOURCES | 1 +
> > generic/lib/clcmacro.h | 22 ++++++++++
> > generic/lib/math/modf.cl | 92
> > +++++++++++++++++++++++++++++++++++++++
> > 6 files changed, 165 insertions(+)
> > create mode 100644 generic/include/clc/math/modf.h
> > create mode 100644 generic/include/clc/math/modf.inc
> > create mode 100644 generic/lib/math/modf.cl
> >
>
>
> Does the pseudocode in the OpenCL documentation implementation for
> this
> function work?
>
> gentype modf ( gentype value, gentype *iptr )
> {
> *iptr = trunc( value );
> return copysign( isinf( value ) ? 0.0 : value -
> *iptr, value );
> }
>
> I would expect this to be faster on CI+ for fp64 due to the native
> trunc
> instruction, although this implementation is probably better for
> hardware without it. The start of this looks a bit like an inlined
> ftrunc
>
I think the example implementation should work as well. I chosed the
implementation from AMD builtins library because I thought it might be
faster, however if that is not the case (and I have no idea about how
the hardware works), than the example implementation is probably the
right way to go. At least it would simplify the code quite a lot.
So what would be the preffered way?
More information about the Libclc-dev
mailing list