[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

Tue Mar 19 18:11:30 PDT 2019

hfinkel added a comment.

We need to make progress on this, and I'd like to suggest a path forward...

First, we have a fundamental problem here: Using host headers to declare functions for the device execution environment isn't sound. Those host headers can do anything, and while some platforms might provide a way to make the host headers more friendly (e.g., by defining __NO_MATH_INLINES), these mechanisms are neither robust nor portable. Thus, we should not rely on host headers to define functions that might be available on the device. However, even when compiling for the device, code meant only for host execution must be semantically analyzable. This, in general, requires the host headers. So we have a situation in which we must both use the host headers during device compilation (to keep the semantic analysis of the surrounding host code working) and also can't use the host headers to provide definitions for use for device code (e.g., because those host headers might provide definitions relying on host inline asm, intrinsics, using types not lowerable in device code, could provide declarations using linkage-affecting attributes not lowerable for the device, etc.).

This is, or is very similar to, the problem that the host/device overloading addresses in CUDA. It is also the problem, or very similar to the problem, that the new OpenMP 5 `declare variant` directive is intended to address. Johannes and I discussed this earlier today, and I suggest that we:

1. Add a math.h wrapper to clang/lib/Headers, which generally just does an include_next of math.h, but provides us with the ability to customize this behavior. Writing a header for OpenMP on NVIDIA GPUs which is essentially identical to the math.h functions in __clang_cuda_device_functions.h would be unfortunate, and as CUDA does provide the underlying execution environment for OpenMP target offload on NVIDIA GPUs, duplicative even in principle. We don't need to alter the default global namespace, however, but can include this file from the wrapper math.h.
2. We should allow host/device overloading in OpenMP mode. As an extension, we could directly reuse the CUDA host/device overloading capability - this also has the advantage of allowing us to directly reuse __clang_cuda_device_functions.h (and perhaps do a similar thing to pick up the device-side printf, etc. from __clang_cuda_runtime_wrapper.h). In the future, we can extend these to provide overloading using OpenMP declare variant, if desired, when in OpenMP mode.

Thoughts?

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D47849/new/

https://reviews.llvm.org/D47849