[Libclc-dev] [PATCH 2/2] Add native_recip(x) as ((1)/(x))

Aaron Watry via Libclc-dev libclc-dev at lists.llvm.org
Sat Sep 9 11:46:14 PDT 2017

On Fri, Sep 8, 2017 at 4:45 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> On Wed, 2017-09-06 at 22:22 -0500, Aaron Watry via Libclc-dev wrote:
>> Signed-off-by: Aaron Watry <awatry at gmail.com>
>> ---
>>  generic/include/clc/clc.h               | 1 +
>>  generic/include/clc/math/native_recip.h | 1 +
>>  2 files changed, 2 insertions(+)
>>  create mode 100644 generic/include/clc/math/native_recip.h
>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>> index a93c8ef..9c0e00c 100644
>> --- a/generic/include/clc/clc.h
>> +++ b/generic/include/clc/clc.h
>> @@ -106,6 +106,7 @@
>>  #include <clc/math/native_log.h>
>>  #include <clc/math/native_log2.h>
>>  #include <clc/math/native_powr.h>
>> +#include <clc/math/native_recip.h>
>>  #include <clc/math/native_sin.h>
>>  #include <clc/math/native_sqrt.h>
>>  #include <clc/math/native_rsqrt.h>
>> diff --git a/generic/include/clc/math/native_recip.h b/generic/include/clc/math/native_recip.h
>> new file mode 100644
>> index 0000000..5187661
>> --- /dev/null
>> +++ b/generic/include/clc/math/native_recip.h
>> @@ -0,0 +1 @@
>> +#define native_recip(x) ((1) / (x))
> I think this would need fast/unsafe math flag to actually generate
> reciprocal, but since all our native_* functions are macros for now.
> Acked-by: Jan Vesely <jan.vesely at rutgers.edu>
> I think it'd be a good idea to come up with a plan to properly support
> native_* functions. to me it looks like the options are:
> a) Treat llvm intrinsics as native precision and use those
> b) Treat llvm intrinsics as properly rounded and export device specific
> __builtins via clang.
> a) is less work (both in LLVM and libclc), but I think b) is what is
> expected of us

Given that the llvm cos/sin intrinsics are already of insufficient
precision for CL purposes, I'm ok with the idea that the intrinsic
functions are sufficient for native_* where precision is up to us.
The only real requirement for native_* is that a result is returned,
with no accuracy requirements given to us by the spec.  Given that
llvm.sin.* is already not accurate enough for us, I don't think we can
do your option b without improving the accuracy of some of the AMDGPU
intrinsic functions anyway.

And yeah, for now, the native_recip generates a floating divide of
1.0f/x.  I didn't find an llvm intrinsic for reciprocal, and I don't
think I found one for __builtin_recip/rcp/etc when I looked either.  I
guess I could hand-code something in llvm assembly, but that was more
effort than I wanted to sink into something that just happened to be
needed to get the blender cycles kernels to compile.

Note: I think that with this function, we're to the point that we've
got everything needed in libclc for blender, but llvm asserts/crashes
out with either SelectionDAG or instruction selection issues later on.
I did had to hack up blender a bit to get clover to be a recognized
platform based on a patch from Edward O'Callaghan [0].


[0] - https://developer.blender.org/D2171

> Jan

More information about the Libclc-dev mailing list