[Libclc-dev] [PATCH 2/2] Add native_recip(x) as ((1)/(x))

Jan Vesely via Libclc-dev libclc-dev at lists.llvm.org
Tue Sep 12 14:40:03 PDT 2017


On Sat, 2017-09-09 at 13:46 -0500, Aaron Watry via Libclc-dev wrote:
> On Fri, Sep 8, 2017 at 4:45 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > On Wed, 2017-09-06 at 22:22 -0500, Aaron Watry via Libclc-dev wrote:
> > > Signed-off-by: Aaron Watry <awatry at gmail.com>
> > > ---
> > >  generic/include/clc/clc.h               | 1 +
> > >  generic/include/clc/math/native_recip.h | 1 +
> > >  2 files changed, 2 insertions(+)
> > >  create mode 100644 generic/include/clc/math/native_recip.h
> > > 
> > > diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> > > index a93c8ef..9c0e00c 100644
> > > --- a/generic/include/clc/clc.h
> > > +++ b/generic/include/clc/clc.h
> > > @@ -106,6 +106,7 @@
> > >  #include <clc/math/native_log.h>
> > >  #include <clc/math/native_log2.h>
> > >  #include <clc/math/native_powr.h>
> > > +#include <clc/math/native_recip.h>
> > >  #include <clc/math/native_sin.h>
> > >  #include <clc/math/native_sqrt.h>
> > >  #include <clc/math/native_rsqrt.h>
> > > diff --git a/generic/include/clc/math/native_recip.h b/generic/include/clc/math/native_recip.h
> > > new file mode 100644
> > > index 0000000..5187661
> > > --- /dev/null
> > > +++ b/generic/include/clc/math/native_recip.h
> > > @@ -0,0 +1 @@
> > > +#define native_recip(x) ((1) / (x))
> > 
> > I think this would need fast/unsafe math flag to actually generate
> > reciprocal, but since all our native_* functions are macros for now.
> > Acked-by: Jan Vesely <jan.vesely at rutgers.edu>
> > 
> > I think it'd be a good idea to come up with a plan to properly support
> > native_* functions. to me it looks like the options are:
> > a) Treat llvm intrinsics as native precision and use those
> > b) Treat llvm intrinsics as properly rounded and export device specific
> > __builtins via clang.
> > 
> > a) is less work (both in LLVM and libclc), but I think b) is what is
> > expected of us
> 
> Given that the llvm cos/sin intrinsics are already of insufficient
> precision for CL purposes, I'm ok with the idea that the intrinsic
> functions are sufficient for native_* where precision is up to us.
> The only real requirement for native_* is that a result is returned,
> with no accuracy requirements given to us by the spec.  Given that
> llvm.sin.* is already not accurate enough for us, I don't think we can
> do your option b without improving the accuracy of some of the AMDGPU
> intrinsic functions anyway.

I think this goes beyond amdgpu target. It has been decided to not have
amdgpu target in compiler-rt, I'm not sure if it's correct to implement
llvm.cos using GPU instructions rather than introducing
amdgcn.cos/r600.cos.
We might want to look at other architectures (nvptx, x86 for llvmpipe
OCL) even the chance anybody will use those is low.
I want to avoid a situation where llvm.cos results in single inaccurate
instruction on one target, and 1000 instructions of accurate algorithm
on another.

> 
> And yeah, for now, the native_recip generates a floating divide of
> 1.0f/x.  I didn't find an llvm intrinsic for reciprocal, and I don't
> think I found one for __builtin_recip/rcp/etc when I looked either.  I
> guess I could hand-code something in llvm assembly, but that was more
> effort than I wanted to sink into something that just happened to be
> needed to get the blender cycles kernels to compile.

feel free to go ahead with ack on this patch. proper solution in the
form of clang __builtin (either generic, or target specific) can come
later.

> 
> Note: I think that with this function, we're to the point that we've
> got everything needed in libclc for blender, but llvm asserts/crashes
> out with either SelectionDAG or instruction selection issues later on.
> I did had to hack up blender a bit to get clover to be a recognized
> platform based on a patch from Edward O'Callaghan [0].

Sounds like removing __CL_USE_NATIVE__ from the blender patch would
also be an option.

https://bugs.freedesktop.org/show_bug.cgi?id=100212
mentions that vstore_half_rtn is also needed. Is that not the case?

Anyway, I get "*** Couldn't join subrange!" in RegisterCoalescer when
trying to run blender on Carrizo/LLVM-5.0

Jan

> 
> --Aaron
> 
> [0] - https://developer.blender.org/D2171
> 
> > 
> > Jan
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20170912/ca1fb935/attachment.sig>


More information about the Libclc-dev mailing list