[PATCH] D60907: [OpenMP] Add math functions support in OpenMP offloading

Mon Apr 29 20:55:16 PDT 2019

hfinkel added a comment.

In D60907#1479370 <https://reviews.llvm.org/D60907#1479370>, @gtbercea wrote:

> In D60907#1479142 <https://reviews.llvm.org/D60907#1479142>, @hfinkel wrote:
>
> > In D60907#1479118 <https://reviews.llvm.org/D60907#1479118>, @gtbercea wrote:
> >
> > > Ping @hfinkel @tra
> >
> >
> > The last two comments in D47849 <https://reviews.llvm.org/D47849> indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?
>
>
> ...
>
> Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

1. Being able to directly reuse the code in `__clang_cuda_device_functions.h`. On the other hand, using this solution we need to implement a wrapper function for every math function. When `__clang_cuda_device_functions.h` is updated, we need to update the OpenMP wrapper as well.
2. Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].
3. Being similar to the "declare variant" functionality from OpenMP 5, and thus, I suspect, closer to the solution we'll eventually be able to apply in a standard way to all targets.

What are all of the long-double functions going to do on NVPTX?

> This solution is following Alexey's suggestions. This solution allows the optimization of math calls if they apply (example: pow(x,2) => x*x ) which was one of the issues in the previous solution I implemented.

So we're also missing that optimization for CUDA code when compiling with Clang? Isn't this also something that, regardless, should be fixed?

Also, how fragile is this? We inline bottom up but this optimization needs to apply before inlining?

Finally, regardless of all of this, do we really need to preinclude this header? Can't we do this with a math.h wrapper?

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60907/new/

https://reviews.llvm.org/D60907