[PATCH] D18328: [CUDA] Add option to mark most functions inside <complex> as host+device.

Mon Mar 21 15:13:17 PDT 2016

jlebar added a comment.

Here are two other approaches we considered and rejected, for the record:

1. Copy-paste a <complex> implementation from e.g. libc++ into __clang_cuda_runtime_wrapper.h, and edit it appropriately.  Then #define the real <complex>'s include guards.

  Main problem with this is the obvious one: We're copying a big chunk of the standard library into the compiler, where it doesn't belong, and now we have two divergent copies of this code to maintain.  In addition, we can't necessarily use libc++, since we need to support pre-c++11 and AIUI libc++ does not.



2. Provide `__device__` overrides for all the functions defined in <complex>.  This almost works, except that we do not (currently) have a way to let you inject new overloads for member functions into classes we don't own.  E.g. we can add a `__device__` overload `std::real(const complex<T>&)`, just like we could override `std::real` in any other way, but we can't add a new `__device__` overload to `std::complex<T>::real()`.

  This approach also has a similar problem to (1), which is that we'd end up copy/pasting almost all of <complex> into the compiler.


================
Comment at: include/clang/Driver/Options.td:383-384
@@ -382,2 +382,4 @@
   HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
+def cuda_allow_std_complex : Flag<["--"], "cuda-allow-std-complex">,
+  HelpText<"Allow CUDA device code to use definitions from <complex>, other than operator>> and operator<<.">;
 def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
----------------
tra wrote:
> rsmith wrote:
> > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
> What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either.
> 
> Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks.
> I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.

This is an important feature for a lot of projects, including tensorflow and eigen.  No matter how we define the flag, I suspect people are going to use it en masse.  (Most projects I've seen pass the equivalent flag to nvcc.)  At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API.

There's also the issue of discoverability.  nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help.

I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works.


http://reviews.llvm.org/D18328