[PATCH] D18328: [CUDA] Add option to mark most functions inside <complex> as host+device.

Mon Mar 21 14:54:26 PDT 2016

jlebar added a comment.

Thanks for the suggestions, Richard.  I'm not sure any of them will work, but I don't defend this patch as anything other than a hack, so if we can come up with something that works for what we need to accomplish and is cleaner, that's great.

In http://reviews.llvm.org/D18328#379824, @rsmith wrote:

> I would much prefer for us to, say, provide a <complex> header that wraps the system one and does something like
>
>   // <complex>
>   #pragma clang cuda_implicit_host_device {
>   #include_next <complex>
>   #pragma clang cuda_implicit_host_device }

We considered this and ruled it out for two reasons:

1. We'd have to exclude operator>> and operator<<, presumably with additional pragmas, and
2. We'd have to exclude everything included by <complex>.

Of course with enough pragmas anything is possible, but at this point it seemed to become substantially more complicated than this (admittedly awful) hack.

> or to provide an explicit list of the functions that we're promoting to `__host__` `__device__`

The problem with that is that libstdc++ uses many helper functions, which we'd also have to enumerate.  Baking those kinds of implementation details into clang seemed worse than this hack.

> or to require people to use a CUDA-compatible standard library if they want CUDA-compatible standard library behaviour.

I think asking people to use a custom standard library is a nonstarter for e.g. OSS tensorflow, and I suspect it would be a considerable amount of work to accomplish in google3.  (Not to suggest that two wrongs make a right, but we already have many similar hacks in place to match nvcc's behavior with standard library functions -- the main difference here is that we're spelling the hack in clang's C++ as opposed to in __clang_cuda_runtime_wrapper.h.)

http://reviews.llvm.org/D18328