[PATCH] D102507: [HIP] Support <functional> in device code

Thu Jun 24 10:29:58 PDT 2021

tra added a comment.

The key difference between C++ and CUDA/HIP, as implemented in clang, is that `__host__` and `__device__` attributes are considered during function overloading in CUDA and HIP, so `__host__ void foo()`, `__device__ void foo()` and `__host__ __device__ void foo()` are three different functions and not redeclarations of the same function. Details of the original proposal are here:  https://goo.gl/EXnymm.

In D102507#2838776 <https://reviews.llvm.org/D102507#2838776>, @yaxunl wrote:

> 2. Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them `__device__ __host__` and use `#if __HIP_DEVICE_COMPILE__` (indicating device compilation) for the minor difference in the function body.

I think we should rely on target overloading when possible, instead of the preprocessor. Minimizing the differences between the code seen by compiler during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

> 3. Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded `__device__` functions.
>
> There are two ways to mark libc++ functions as `__device__ __host__`:
>
> 1. Define a macro which expands to empty for non-HIP programs and expands to `__device__ __host__` for HIP and add it to each libc++ function which is to be marked as `__device__ __host__`.

One caveat of the overloading based on target attributes is that we can't re-declare a function with `__device__ __host__` as compiler will see attempted redeclaration as a function overload of a function w/o attributes (implicitly `__host__`).

> 2. Define macros which expand to empty for non-HIP programs and expand to `#pragma clang force_cuda_host_device begin/end` for HIP and put them at the beginning and end of a file where all the functions are to be marked as `__device__ __host__`.
>
> We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 4) usefulness for device execution 3) availability of lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences in the builtins and the the functions provided (or not) by runtime on the GPU side.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102507/new/

https://reviews.llvm.org/D102507