[PATCH] D16870: [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.

Thu Feb 11 14:25:41 PST 2016

jlebar added a comment.

Mostly comments on comments.


================
Comment at: lib/Sema/SemaCUDA.cpp:71
@@ -70,3 +70,3 @@
 // H  - handled in (x)
-// Preferences: b-best, f-fallback, l-last resort, n-never.
+// Preferences: +:native, *:host-device, o:same side, .:wrong side, -:never.
 //
----------------
What if we used the following mapping?

N = native
HD = host+device
SS = same-side
WS = wrong-side
`-` = never

This mimics how we were writing on the whiteboard.

================
Comment at: lib/Sema/SemaCUDA.cpp:115
@@ -114,2 +114,3 @@
 
-  // (b) Best case scenarios
+  // (b) Calling HostDevice is OK as a fallback that works for everyone.
+  if (CalleeTarget == CFT_HostDevice)
----------------
I'm not sure "fallback" is the right word to use here anymore, as HD --> HD gets priority HD.

================
Comment at: lib/Sema/SemaCUDA.cpp:127
@@ -132,9 +126,3 @@
   if (CallerTarget == CFT_HostDevice) {
-    // Calling a function that matches compilation mode is OK.
-    // Calling a function from the other side is frowned upon.
-    if (getLangOpts().CUDAIsDevice)
-      return CalleeTarget == CFT_Device ? CFP_Fallback : QuestionableResult;
-    else
-      return (CalleeTarget == CFT_Host || CalleeTarget == CFT_Global)
-                 ? CFP_Fallback
-                 : QuestionableResult;
+    // It's OK to call a mode-matching function from an HD one.
+    if ((getLangOpts().CUDAIsDevice && CalleeTarget == CFT_Device) ||
----------------
Suggest "compilation-mode matching"

================
Comment at: lib/Sema/SemaOverload.cpp:8731
@@ +8730,3 @@
+  // but accepted by both clang and NVCC. However during particular
+  // compilation pass only one call variant is viable. We need to
+  // exclude non-viable overload candidates from consideration based
----------------
We should be consistent wrt whether we call it a compilation pass, or (as above) compilation mode, or whatever.  (I think "mode" may be right.)

Also please add an article: "During a particular compilation mode".

================
Comment at: lib/Sema/SemaOverload.cpp:8738
@@ +8737,3 @@
+    const FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
+    bool IgnoreWrongSideFunctions =
+        llvm::any_of(Candidates, [&](OverloadCandidate *Cand) {
----------------
Maybe name this something more closely tied to what it is -- e.g. ContainsSameSideCall -- and then add a comment in the if statement saying "Remove wrong-side calls from consideration."

================
Comment at: lib/Sema/SemaOverload.cpp:8752
@@ +8751,3 @@
+                                      }),
+                       Candidates.end());
+  }
----------------
This is an indentation mouthful -- can we pull the lambda out, maybe?

================
Comment at: lib/Sema/SemaOverload.cpp:8757
@@ -8726,3 +8756,3 @@
   Best = end();
-  for (iterator Cand = begin(); Cand != end(); ++Cand) {
+  for (auto Cand : Candidates)
     if (Cand->Viable)
----------------
Suggest auto*, so it's clear we're not copying things.

================
Comment at: lib/Sema/SemaOverload.cpp:8771
@@ -8741,3 +8770,3 @@
   // function. If not, we have an ambiguity.
-  for (iterator Cand = begin(); Cand != end(); ++Cand) {
+  for (auto Cand : Candidates) {
     if (Cand->Viable &&
----------------
auto*

================
Comment at: test/CodeGenCUDA/function-overload.cu:81
@@ -79,1 +80,3 @@
 
+// NOTE: this is an artefact of split-mode CUDA compilation that we
+// need to mimic. HD functions are sometimes allowed to call H or D
----------------
Nit: Suggest American spelling, "artifact", which is much more common in llvm codebase.

Maybe also remove this sentence, or move it down somewhere later -- this feels like a bad "topic sentence" for the paragraph.  e.g.

HD functions are sometimes allowed to call H or D functions -- this is an artifact of the source-to-source splitting performed by nvcc that we need to mimic.

================
Comment at: test/CodeGenCUDA/function-overload.cu:86
@@ +85,3 @@
+// considered at all. For clang both H and D variants will become
+// function overloads. Normally target attribute is considered only if
+// C++ rules can not determine which function is better. However in
----------------
This is setting up a contrast between nvcc and clang, so suggest connecting the phrases with "but" or "in contrast".  Also suggest being specific that we're talking about nvcc -- since split-mode compilation isn't a thing in clang, if we just talk about it generally, it's not clear what we're referring to.  e.g.

During device mode compilation in nvcc, host functions aren't present at all, so don't participate in overloading.  But in clang, H and D functions are present in both compilation modes.  Clang normally uses the target attribute as a tiebreaker between overloads with otherwise identical priority, but in order to match nvcc's behavior, we sometimes need to wholly discard overloads that would not be present during compilation under nvcc.

================
Comment at: test/CodeGenCUDA/function-overload.cu:98
@@ +97,3 @@
+// by -fcuda-disable-target-call-checks and, according to C++ overload
+// resolution rules, would be prefered over function template.
+// CHECK-BOTH-LABEL: define void @_Z5hd_tfv()
----------------
This was a bit hard to follow, suggest rewording slightly:

Here we expect to call the templated function during host compilation, even if -fcuda-disable-target-call-checks is passed, and even though C++ overload rules prefer the non-templated function.

================
Comment at: test/CodeGenCUDA/function-overload.cu:110
@@ +109,3 @@
+// Calls from __host__ and __device__ functions should always call
+// overloaded function that matches their mode.
+// CHECK-HOST-LABEL: define void @_Z4h_tfv()
----------------
call the overloaded function

================
Comment at: test/CodeGenCUDA/function-overload.cu:128
@@ +127,3 @@
+// In case we have mix of HD and H-only or D-only candidates in the
+// overload set normal C++ overload resolution rules apply first.
+template <typename T> T template_vs_hd_function(T arg) { return 15; }
----------------
If we have a mix

in the overload set, normal

================
Comment at: test/CodeGenCUDA/function-overload.cu:162
@@ +161,3 @@
+// Check that overloads still work the same way on both host and
+// device side when overload set contains only functions from one side
+// of compilation.
----------------
when the overload set


http://reviews.llvm.org/D16870