[PATCH] D18328: [CUDA] Add option to mark most functions inside <complex> as host+device.

Mon Mar 21 15:35:39 PDT 2016

rsmith added inline comments.

================
Comment at: include/clang/Driver/Options.td:383-384
@@ -382,2 +382,4 @@
   HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
+def cuda_allow_std_complex : Flag<["--"], "cuda-allow-std-complex">,
+  HelpText<"Allow CUDA device code to use definitions from <complex>, other than operator>> and operator<<.">;
 def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
----------------
jlebar wrote:
> jlebar wrote:
> > rnk wrote:
> > > jlebar wrote:
> > > > tra wrote:
> > > > > rsmith wrote:
> > > > > > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
> > > > > What would be a better way to enable this 'feature'? I guess we could live with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be particularly good way to give user control, either.
> > > > > 
> > > > > Perhaps we should have some sort of --cuda-enable-extension=foo option to control CUDA hacks.
> > > > > I don't think it's reasonable to have something this hacky / arbitrary in the stable Clang driver interface.
> > > > 
> > > > This is an important feature for a lot of projects, including tensorflow and eigen.  No matter how we define the flag, I suspect people are going to use it en masse.  (Most projects I've seen pass the equivalent flag to nvcc.)  At the point that many or even most projects are relying on it, I'd suspect we'll have difficulty changing this flag, regardless of whether or not it is officially part of our stable API.
> > > > 
> > > > There's also the issue of discoverability.  nvcc actually gives a nice error message when you try to use std::complex -- it seems pretty unfriendly not to even list the relevant flag in clang --help.
> > > > 
> > > > I don't feel particularly strongly about this, though -- I'm more concerned about getting something that works.
> > > What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility? We probably don't want a super fine grained LangOpt like this.
> > An alternative wrt the flag is to enable it by default.  This would be somewhat consistent with existing behavior, wherein we make most std math functions available without a special flag, even though they're not technically host-device.  The main difference here is that there we're matching nvcc's default behavior, whereas here we're actually going further than nvcc -- nvcc by default doesn't let you touch std::complex from device code at all, and with a flag, you can touch its *constexpr* functions.  Which is not actually very much.
> > 
> > Nonetheless, since the user-visible effect is consistent with our approach of making std math stuff available, and since this shouldn't make us reject code nvcc accepts, I'd be more OK hiding the flag to turn it off.
> > What if we had a catchall nvcc quirks mode flag similar to -fms-compatibility?
> 
> I think we midair'ed on this.  See above comment about turning this flag on by default -- calling this "nvcc compat" wouldn't quite be right.  We could certainly have a broader flag, but I'm not sure at the moment what else would reasonably go in with this one.
I'd find either of these suggestions (-fnvcc-compatibility or a cc1-only flag to turn this behaviour off) more palatable than the current approach.

I'd also be a lot happier about this if we can view it as a short-term workaround, with the longer-term fix being to get the host/device attributes added to standard library implementations (even if it turns out we can never actually remove this workaround in practice). If we can legitimately claim that this is the way that CUDA is intended to work, and the missing attributes in <complex> are a bug in that header (in CUDA mode), then that provides a solid justification for having this complexity in Clang.

================
Comment at: lib/Sema/SemaCUDA.cpp:464-465
@@ +463,4 @@
+// <complex> without passing -fcuda-allow-std-complex.
+// TODO: Output a nvcc-compat warning if you try to use a non-constexpr function
+// from <complex> -- nvcc only lets you use constexpr functions.
+bool Sema::declShouldBeCUDAHostDevice(const FunctionDecl &FD) {
----------------
Does nvcc do this "`constexpr` implies `__host__ __device__`" thing only for functions declared within <complex>, or for all functions?

Another alternative strategy: a wrapper `<complex>` header that does this:

    #include // ... union of includes from libc++ and libstdc++ <complex>
    #define constexpr __host__ __device__ constexpr
    #include_next <complex>
    #undef constexpr

================
Comment at: lib/Sema/SemaCUDA.cpp:485
@@ +484,3 @@
+  if (const auto *Method = dyn_cast<CXXMethodDecl>(&FD))
+    if (const auto *Parent = Method->getParent())
+      IsInStd |= Parent->isInStdNamespace();
----------------
rnk wrote:
> There's no cast on the RHS, so I'd spell out `CXXRecordDecl` here to make things more obvious.
`Parent` can't be null for a `CXXMethodDecl`, so just `Method->getParent()->isInStdNamespace()` would work.


http://reviews.llvm.org/D18328