[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

Wed Mar 20 10:10:40 PDT 2019

tra added a comment.

> This is, or is very similar to, the problem that the host/device overloading addresses in CUDA.

IIRC the difference was that OpenMP didn't have explicit notion of host/device functions which made it hard to apply host/device overloading in practice.

> It is also the problem, or very similar to the problem, that the new OpenMP 5 `declare variant` directive is intended to address. Johannes and I discussed this earlier today, and I suggest that we:

Interesting. `declare variant ` sounds (according to openmp-TR7 doc) like a `__device__` on steroids. That may indeed make things work. Actually, I would like __device__ eventually work like `device variant`, so we can have multiple device overloads specialized for particular GPU architecture without relying on preprocessor's `__CUDA_ARCH__`.

> 
> 
> 1. Add a math.h wrapper to clang/lib/Headers, which generally just does an include_next of math.h, but provides us with the ability to customize this behavior. Writing a header for OpenMP on NVIDIA GPUs which is essentially identical to the math.h functions in __clang_cuda_device_functions.h would be unfortunate, and as CUDA does provide the underlying execution environment for OpenMP target offload on NVIDIA GPUs, duplicative even in principle. We don't need to alter the default global namespace, however, but can include this file from the wrapper math.h.

Using `__clang_cuda_device_functions.h` in addition to `math.h` wrapper should be fine. It gives us a path to provide device-side standard math library implementation and math.h wrapper provides convenient point to hook in the implementation for platforms other than CUDA.

> 2. We should allow host/device overloading in OpenMP mode. As an extension, we could directly reuse the CUDA host/device overloading capability - this also has the advantage of allowing us to directly reuse __clang_cuda_device_functions.h (and perhaps do a similar thing to pick up the device-side printf, etc. from __clang_cuda_runtime_wrapper.h). In the future, we can extend these to provide overloading using OpenMP declare variant, if desired, when in OpenMP mode.

Is OpenMP is still essentially C-based? host/device overloading relies on C++ machinery. I think it should work with `__attribute__((overloadable))` but it's not been tested.

We may need to restructure bits and pieces of CUDA-related headers to make them reusable by OpenMP.  I guess that with `declare variant` we may be able to reuse most of the headers as is by treating `__device__` as if the function was a variant for NVPTX back-end.

> Thoughts?

SGTM. Let me know if something in the CUDA-related headers gets in the way.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D47849/new/

https://reviews.llvm.org/D47849