[Libclc-dev] [PATCH 2/2] Add native_recip(x) as ((1)/(x))
Jan Vesely via Libclc-dev
libclc-dev at lists.llvm.org
Tue Sep 12 14:40:03 PDT 2017
On Sat, 2017-09-09 at 13:46 -0500, Aaron Watry via Libclc-dev wrote:
> On Fri, Sep 8, 2017 at 4:45 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > On Wed, 2017-09-06 at 22:22 -0500, Aaron Watry via Libclc-dev wrote:
> > > Signed-off-by: Aaron Watry <awatry at gmail.com>
> > > ---
> > > generic/include/clc/clc.h | 1 +
> > > generic/include/clc/math/native_recip.h | 1 +
> > > 2 files changed, 2 insertions(+)
> > > create mode 100644 generic/include/clc/math/native_recip.h
> > >
> > > diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> > > index a93c8ef..9c0e00c 100644
> > > --- a/generic/include/clc/clc.h
> > > +++ b/generic/include/clc/clc.h
> > > @@ -106,6 +106,7 @@
> > > #include <clc/math/native_log.h>
> > > #include <clc/math/native_log2.h>
> > > #include <clc/math/native_powr.h>
> > > +#include <clc/math/native_recip.h>
> > > #include <clc/math/native_sin.h>
> > > #include <clc/math/native_sqrt.h>
> > > #include <clc/math/native_rsqrt.h>
> > > diff --git a/generic/include/clc/math/native_recip.h b/generic/include/clc/math/native_recip.h
> > > new file mode 100644
> > > index 0000000..5187661
> > > --- /dev/null
> > > +++ b/generic/include/clc/math/native_recip.h
> > > @@ -0,0 +1 @@
> > > +#define native_recip(x) ((1) / (x))
> > I think this would need fast/unsafe math flag to actually generate
> > reciprocal, but since all our native_* functions are macros for now.
> > Acked-by: Jan Vesely <jan.vesely at rutgers.edu>
> > I think it'd be a good idea to come up with a plan to properly support
> > native_* functions. to me it looks like the options are:
> > a) Treat llvm intrinsics as native precision and use those
> > b) Treat llvm intrinsics as properly rounded and export device specific
> > __builtins via clang.
> > a) is less work (both in LLVM and libclc), but I think b) is what is
> > expected of us
> Given that the llvm cos/sin intrinsics are already of insufficient
> precision for CL purposes, I'm ok with the idea that the intrinsic
> functions are sufficient for native_* where precision is up to us.
> The only real requirement for native_* is that a result is returned,
> with no accuracy requirements given to us by the spec. Given that
> llvm.sin.* is already not accurate enough for us, I don't think we can
> do your option b without improving the accuracy of some of the AMDGPU
> intrinsic functions anyway.
I think this goes beyond amdgpu target. It has been decided to not have
amdgpu target in compiler-rt, I'm not sure if it's correct to implement
llvm.cos using GPU instructions rather than introducing
We might want to look at other architectures (nvptx, x86 for llvmpipe
OCL) even the chance anybody will use those is low.
I want to avoid a situation where llvm.cos results in single inaccurate
instruction on one target, and 1000 instructions of accurate algorithm
> And yeah, for now, the native_recip generates a floating divide of
> 1.0f/x. I didn't find an llvm intrinsic for reciprocal, and I don't
> think I found one for __builtin_recip/rcp/etc when I looked either. I
> guess I could hand-code something in llvm assembly, but that was more
> effort than I wanted to sink into something that just happened to be
> needed to get the blender cycles kernels to compile.
feel free to go ahead with ack on this patch. proper solution in the
form of clang __builtin (either generic, or target specific) can come
> Note: I think that with this function, we're to the point that we've
> got everything needed in libclc for blender, but llvm asserts/crashes
> out with either SelectionDAG or instruction selection issues later on.
> I did had to hack up blender a bit to get clover to be a recognized
> platform based on a patch from Edward O'Callaghan .
Sounds like removing __CL_USE_NATIVE__ from the blender patch would
also be an option.
mentions that vstore_half_rtn is also needed. Is that not the case?
Anyway, I get "*** Couldn't join subrange!" in RegisterCoalescer when
trying to run blender on Carrizo/LLVM-5.0
>  - https://developer.blender.org/D2171
> > Jan
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part
More information about the Libclc-dev