[PATCH] End-to-end CUDA compilation.

Mon May 4 17:25:56 PDT 2015

Hi Art,

Starting to look pretty good here. I've got a few inline nits and a couple of small requests, but I think we're almost ready to go here. Sorry for the delays.

-eric


================
Comment at: lib/CodeGen/CGCUDANV.cpp:164
@@ +163,3 @@
+/// Creates internal function to register all kernel stubs generated in this
+/// module with CUDA runtime.
+/// \code
----------------
"with the CUDA runtime".

================
Comment at: lib/CodeGen/CGCUDANV.cpp:166
@@ +165,3 @@
+/// \code
+/// void .cuda_register_kernels(void** GpuBinaryHandle) {
+///    __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...);
----------------
The function name begins with a .? Ugh.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:197-207
@@ +196,13 @@
+    llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
+    llvm::Value *args[] = {
+        &GpuBinaryHandlePtr,
+        Builder.CreateBitCast(Kernel, VoidPtrTy),
+        KernelName,
+        KernelName,
+        llvm::ConstantInt::get(IntTy, -1),
+        NullPtr,
+        NullPtr,
+        NullPtr,
+        NullPtr,
+        llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
+    Builder.CreateCall(RegisterFunc, args);
----------------
clang-format?

================
Comment at: lib/Driver/Driver.cpp:183
@@ -182,3 +182,3 @@
 
-    // -c only runs up to the assembler.
-  } else if ((PhaseArg = DAL.getLastArg(options::OPT_c))) {
+    // -c and partial CUDA compilations only runs up to the assembler.
+  } else if ((PhaseArg = DAL.getLastArg(options::OPT_c)) ||
----------------
"and partial CUDA compilations only run up"

================
Comment at: lib/Driver/Driver.cpp:1194
@@ +1193,3 @@
+  if (GpuArchList.empty())
+    GpuArchList.push_back("sm_20");
+
----------------
Some comment on the default here.

================
Comment at: lib/Driver/Driver.cpp:1672-1674
@@ -1551,1 +1671,5 @@
 
+static llvm::Triple computeTargetTriple(StringRef DefaultTargetTriple,
+                                        const ArgList &Args,
+                                        StringRef DarwinArchName);
+
----------------
Do you need the declaration up here? Why not just pull the static function up if so?

================
Comment at: lib/Driver/Driver.cpp:1732
@@ +1731,3 @@
+        computeTargetTriple(DefaultTargetTriple, C.getArgs(), "");
+    llvm::Triple TargetTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"
+                                                       : "nvptx-nvidia-cuda");
----------------
Probably would prefer "DeviceTriple" here.

================
Comment at: lib/Driver/Tools.cpp:2583-2588
@@ -2577,1 +2582,8 @@
+  assert(Inputs.size() >= 1 && "Must have at least one input.");
+  InputInfoList BaseInputs; // Inputs[0]
+  const InputInfo &Input = Inputs[0];
+  BaseInputs.push_back(Input);
+  bool IsCuda = types::isCuda(Input.getType());
+
+  assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple inputs.");
 
----------------
Comment about what's going on here.

================
Comment at: lib/Driver/Tools.cpp:2696
@@ -2683,3 +2695,3 @@
   CmdArgs.push_back("-main-file-name");
-  CmdArgs.push_back(getBaseInputName(Args, Inputs));
+  CmdArgs.push_back(getBaseInputName(Args, Input));
 
----------------
Might be nice to pull this sort of change out so it isn't affecting the rest of the diff.

================
Comment at: lib/Driver/Tools.cpp:5741-5744
@@ -5718,2 +5740,6 @@
 
 const char *Clang::getBaseInputName(const ArgList &Args,
+                                    const InputInfo &Input) {
+  return Args.MakeArgString(llvm::sys::path::filename(Input.getBaseInput()));
+}
+
----------------
Please pull this out into a separate patch.

================
Comment at: test/CodeGenCUDA/device-stub.cu:40-46
@@ +39,9 @@
+// Test that we've built contructor..
+// CHECK: define internal void @.cuda_module_ctor
+//   .. that calls __cudaRegisterFatBinary(&.cuda_fatbin_wrapper)
+// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}.cuda_fatbin_wrapper
+//   .. stores return value in .cuda_gpubin_handle
+// CHECK: store{{.*}}.cuda_gpubin_handle
+//   .. and then calls .cuda_register_kernels
+// CHECK: call void @.cuda_register_kernels
+
----------------
Should some of these be CHECK-NEXT?

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/