[PATCH] D18328: [CUDA] Add option to mark most functions inside <complex> as host+device.

Mon Mar 21 15:48:52 PDT 2016

jlebar added inline comments.

================
Comment at: include/clang/Driver/Options.td:383-384
@@ -382,2 +382,4 @@
   HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
+def cuda_allow_std_complex : Flag<["--"], "cuda-allow-std-complex">,
+  HelpText<"Allow CUDA device code to use definitions from <complex>, other than operator>> and operator<<.">;
 def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
----------------
rsmith wrote:
> jlebar wrote:
> > jlebar wrote:
> > > rnk wrote:
> > > > jlebar wrote:
> > > > > tra wrote:
> > > > > > rsmith wrote:
> > > > > > > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
> > > > > > What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either.
> > > > > > 
> > > > > > Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks.
> > > > > > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
> > > > > 
> > > > > This is an important feature for a lot of projects, including tensorflow and eigen.  No matter how we define the flag, I suspect people are going to use it en masse.  (Most projects I've seen pass the equivalent flag to nvcc.)  At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API.
> > > > > 
> > > > > There's also the issue of discoverability.  nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help.
> > > > > 
> > > > > I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works.
> > > > What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably don't want a super fine grained LangOpt like this.
> > > An alternative wrt the flag is to enable it by default.  This would be somewhat consistent with existing behavior, wherein we make most std math functions available without a special flag, even though they're not technically host-device.  The main difference here is that there we're matching nvcc's default behavior, whereas here we're actually going further than nvcc -- nvcc by default doesn't let you touch std::complex from device code at all, and with a flag, you can touch its *constexpr* functions.  Which is not actually very much.
> > > 
> > > Nonetheless, since the user-visible effect is consistent with our approach of making std math stuff available, and since this shouldn't make us reject code nvcc accepts, I'd be more OK hiding the flag to turn it off.
> > > What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility?
> > 
> > I think we midair'ed on this.  See above comment about turning this flag on by default -- calling this "nvcc compat" wouldn't quite be right.  We could certainly have a broader flag, but I'm not sure at the moment what else would reasonably go in with this one.
> I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this behaviour off) more palatable than the current approach.
> 
> I'd also be a lot happier about this if we can view it as a short-term workaround, with the longer-term fix being to get the host/device attributes added to standard library implementations (even if it turns out we can never actually remove this workaround in practice). If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang.
> If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang.

I think that

  1) the number of people passing --relaxed-constexpr to nvcc just so they can use a limited subset of std::complex, and
  
  2) the fact that we're already doing this for (basically all) other std math functions

may be decent arguments for this.  But I don't know if I'm a great judge of what we can legitimately claim here.

================
Comment at: lib/Sema/SemaCUDA.cpp:464-465
@@ +463,4 @@
+// <complex> without passing -fcuda-allow-std-complex.
+// TODO: Output a nvcc-compat warning if you try to use a non-constexpr function
+// from <complex> -- nvcc only lets you use constexpr functions.
+bool Sema::declShouldBeCUDAHostDevice(const FunctionDecl &FD) {
----------------
rsmith wrote:
> Does nvcc do this "`constexpr` implies `__host__ __device__`" thing only for functions declared within <complex>, or for all functions?
> 
> Another alternative strategy: a wrapper `<complex>` header that does this:
> 
>     #include // ... union of includes from libc++ and libstdc++ <complex>
>     #define constexpr __host__ __device__ constexpr
>     #include_next <complex>
>     #undef constexpr
> Does nvcc do this "constexpr implies __host__ __device__" thing only for functions declared within <complex>, or for all functions?

All functions.  Although std::complex is the main use I've observed.

> Another alternative strategy: a wrapper <complex> header that does this:

That one is quite clever, although I'm not sure about enumerating all of the includes from the headers.  I guess that should be reasonably stable...

I think I would like to get full complex support, though, if we can agree on a path towards that.  The current limitation is silly, it seems clear that people want this, and the constexpr thing gives you but a shadow of the actual library.

================
Comment at: lib/Sema/SemaCUDA.cpp:479-481
@@ +478,5 @@
+    return false;
+  StringRef Filename = FE->getName();
+  if (Filename != "complex" && !Filename.endswith("/complex"))
+    return false;
+
----------------
rsmith wrote:
> I don't think this works: the standard library might factor parts of <complex> out into separate header files. For instance, libstdc++ 4.4 includes the TR1 pieces of <complex> in that way.
Hm, that is unfortunate.  One option would be to say that we just don't support this.  Otherwise we have to go down the road of identifying all the relevant functions...


http://reviews.llvm.org/D18328