[PATCH] D16870: [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.

Thu Feb 4 11:06:21 PST 2016

jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Looks sane to me.  Just some suggestions on the comments.


================
Comment at: lib/Sema/SemaCUDA.cpp:71
@@ -70,3 +70,3 @@
 // H  - handled in (x)
-// Preferences: b-best, f-fallback, l-last resort, n-never.
+// Preferences: +:native, *:host-device, o:same side, .:wrong side, -:never.
 //
----------------
If we're going to use symbols rather than letters, could we use 4, 3, 2, 1, 0?  I think that would be easier to follow.

================
Comment at: lib/Sema/SemaCUDA.cpp:127
@@ -132,9 +126,3 @@
   if (CallerTarget == CFT_HostDevice) {
-    // Calling a function that matches compilation mode is OK.
-    // Calling a function from the other side is frowned upon.
-    if (getLangOpts().CUDAIsDevice)
-      return CalleeTarget == CFT_Device ? CFP_Fallback : QuestionableResult;
-    else
-      return (CalleeTarget == CFT_Host || CalleeTarget == CFT_Global)
-                 ? CFP_Fallback
-                 : QuestionableResult;
+    // It's OK to call mode-matching function from HD one.
+    if ((getLangOpts().CUDAIsDevice && CalleeTarget == CFT_Device) ||
----------------
Nit: "It's OK to call a mode-matching function from an HD function."

================
Comment at: lib/Sema/SemaOverload.cpp:8536
@@ +8535,3 @@
+  // compatible with existing code that relies on this. If we see such
+  // a case, return better variant right away.
+  if (S.getLangOpts().CUDA && S.getLangOpts().CUDATargetOverloads &&
----------------
Since we have language lawyers on the team, suggest adding articles to comment:

If an HD function calls a function which has host-only and device-only overloads, nvcc sees only the host-side function during host compilation and only the device function during device-side compilation.  (This appears to be a side-effect of its splitting of host and device code into separate TUs.)  Alas we need to be compatible with existing code that relies on this, so if we see such a case, return the better variant right away.

I actually might suggest rephrasing this a bit more, to something like:

When performing host-side compilation, nvcc doesn't see device functions, and similarly when performing device-side compilation, nvcc doesn't see host functions.  (This is a consequence of the fact that it splits host and device code into separate TUs.)  We see all functions in both compilation modes, so to match nvcc's behavior, we need to exclude some overload candidates from consideration based only on their host/device attributes.  Specifically, if one candidate call is WrongSide and the other is Native or SameSide, we ignore the WrongSide candidate.  If we don't return early here, we'll consider the CUDA target attributes again later in this function, as a tiebreaker between calls with otherwise identical priority according to the regular C++ overloading rules.

================
Comment at: test/CodeGenCUDA/function-overload.cu:96
@@ +95,3 @@
+
+// In this case during host compilation we expect to cal function
+// template even if __device__ function may be available and allowed
----------------
call


http://reviews.llvm.org/D16870