[Libclc-dev] [PATCH 2/2] Add native_recip(x) as ((1)/(x))

Aaron Watry via Libclc-dev libclc-dev at lists.llvm.org
Tue Sep 12 18:40:34 PDT 2017


On Tue, Sep 12, 2017 at 4:40 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Sat, 2017-09-09 at 13:46 -0500, Aaron Watry via Libclc-dev wrote:
>> On Fri, Sep 8, 2017 at 4:45 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>> > On Wed, 2017-09-06 at 22:22 -0500, Aaron Watry via Libclc-dev wrote:
>> > > Signed-off-by: Aaron Watry <awatry at gmail.com>
>> > > ---
>> > >  generic/include/clc/clc.h               | 1 +
>> > >  generic/include/clc/math/native_recip.h | 1 +
>> > >  2 files changed, 2 insertions(+)
>> > >  create mode 100644 generic/include/clc/math/native_recip.h
>> > >
>> > > diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>> > > index a93c8ef..9c0e00c 100644
>> > > --- a/generic/include/clc/clc.h
>> > > +++ b/generic/include/clc/clc.h
>> > > @@ -106,6 +106,7 @@
>> > >  #include <clc/math/native_log.h>
>> > >  #include <clc/math/native_log2.h>
>> > >  #include <clc/math/native_powr.h>
>> > > +#include <clc/math/native_recip.h>
>> > >  #include <clc/math/native_sin.h>
>> > >  #include <clc/math/native_sqrt.h>
>> > >  #include <clc/math/native_rsqrt.h>
>> > > diff --git a/generic/include/clc/math/native_recip.h b/generic/include/clc/math/native_recip.h
>> > > new file mode 100644
>> > > index 0000000..5187661
>> > > --- /dev/null
>> > > +++ b/generic/include/clc/math/native_recip.h
>> > > @@ -0,0 +1 @@
>> > > +#define native_recip(x) ((1) / (x))
>> >
>> > I think this would need fast/unsafe math flag to actually generate
>> > reciprocal, but since all our native_* functions are macros for now.
>> > Acked-by: Jan Vesely <jan.vesely at rutgers.edu>
>> >
>> > I think it'd be a good idea to come up with a plan to properly support
>> > native_* functions. to me it looks like the options are:
>> > a) Treat llvm intrinsics as native precision and use those
>> > b) Treat llvm intrinsics as properly rounded and export device specific
>> > __builtins via clang.
>> >
>> > a) is less work (both in LLVM and libclc), but I think b) is what is
>> > expected of us
>>
>> Given that the llvm cos/sin intrinsics are already of insufficient
>> precision for CL purposes, I'm ok with the idea that the intrinsic
>> functions are sufficient for native_* where precision is up to us.
>> The only real requirement for native_* is that a result is returned,
>> with no accuracy requirements given to us by the spec.  Given that
>> llvm.sin.* is already not accurate enough for us, I don't think we can
>> do your option b without improving the accuracy of some of the AMDGPU
>> intrinsic functions anyway.
>
> I think this goes beyond amdgpu target. It has been decided to not have
> amdgpu target in compiler-rt, I'm not sure if it's correct to implement
> llvm.cos using GPU instructions rather than introducing
> amdgcn.cos/r600.cos.
> We might want to look at other architectures (nvptx, x86 for llvmpipe
> OCL) even the chance anybody will use those is low.
> I want to avoid a situation where llvm.cos results in single inaccurate
> instruction on one target, and 1000 instructions of accurate algorithm
> on another.
>
>>
>> And yeah, for now, the native_recip generates a floating divide of
>> 1.0f/x.  I didn't find an llvm intrinsic for reciprocal, and I don't
>> think I found one for __builtin_recip/rcp/etc when I looked either.  I
>> guess I could hand-code something in llvm assembly, but that was more
>> effort than I wanted to sink into something that just happened to be
>> needed to get the blender cycles kernels to compile.
>
> feel free to go ahead with ack on this patch. proper solution in the
> form of clang __builtin (either generic, or target specific) can come
> later.

Note for reference... while this generates .ll of:
  %1 = load float, float addrspace(1)* %arrayidx1, align 4, !tbaa !7
  %div2 = fdiv float 1.000000e+00, %1, !fpmath !11

The amdgpu back-end generates:
    s_load_dword s2, s[4:5], 0x0
    s_waitcnt lgkmcnt(0)
    v_rcp_f32_e32 v0, s2

So while we don't have an intrinsic or built-in for this, amdgpu is
smart enough to recognize the pattern.

--Aaron


>
>>
>> Note: I think that with this function, we're to the point that we've
>> got everything needed in libclc for blender, but llvm asserts/crashes
>> out with either SelectionDAG or instruction selection issues later on.
>> I did had to hack up blender a bit to get clover to be a recognized
>> platform based on a patch from Edward O'Callaghan [0].
>
> Sounds like removing __CL_USE_NATIVE__ from the blender patch would
> also be an option.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=100212
> mentions that vstore_half_rtn is also needed. Is that not the case?
>
> Anyway, I get "*** Couldn't join subrange!" in RegisterCoalescer when
> trying to run blender on Carrizo/LLVM-5.0
>
> Jan
>
>>
>> --Aaron
>>
>> [0] - https://developer.blender.org/D2171
>>
>> >
>> > Jan
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev


More information about the Libclc-dev mailing list