[PATCH] End-to-end CUDA compilation.

Tue May 5 13:24:13 PDT 2015

================
Comment at: lib/CodeGen/CGCUDANV.cpp:164
@@ +163,3 @@
+/// Creates internal function to register all kernel stubs generated in this
+/// module with CUDA runtime.
+/// \code
----------------
echristo wrote:
> "with the CUDA runtime".
Done.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:166
@@ +165,3 @@
+/// \code
+/// void .cuda_register_kernels(void** GpuBinaryHandle) {
+///    __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...);
----------------
echristo wrote:
> The function name begins with a .? Ugh.
Replaced with __

================
Comment at: lib/CodeGen/CGCUDANV.cpp:197-207
@@ +196,13 @@
+    llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
+    llvm::Value *args[] = {
+        &GpuBinaryHandlePtr,
+        Builder.CreateBitCast(Kernel, VoidPtrTy),
+        KernelName,
+        KernelName,
+        llvm::ConstantInt::get(IntTy, -1),
+        NullPtr,
+        NullPtr,
+        NullPtr,
+        NullPtr,
+        llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
+    Builder.CreateCall(RegisterFunc, args);
----------------
echristo wrote:
> clang-format?
Done. I've also replaced last argument with a plain NullPtr.

================
Comment at: lib/Driver/Driver.cpp:183
@@ -182,3 +182,3 @@
 
-    // -c only runs up to the assembler.
-  } else if ((PhaseArg = DAL.getLastArg(options::OPT_c))) {
+    // -c and partial CUDA compilations only runs up to the assembler.
+  } else if ((PhaseArg = DAL.getLastArg(options::OPT_c)) ||
----------------
echristo wrote:
> "and partial CUDA compilations only run up"
Fixed.

================
Comment at: lib/Driver/Driver.cpp:1194
@@ +1193,3 @@
+  if (GpuArchList.empty())
+    GpuArchList.push_back("sm_20");
+
----------------
echristo wrote:
> Some comment on the default here.
Done.

================
Comment at: lib/Driver/Driver.cpp:1672-1674
@@ -1551,1 +1671,5 @@
 
+static llvm::Triple computeTargetTriple(StringRef DefaultTargetTriple,
+                                        const ArgList &Args,
+                                        StringRef DarwinArchName);
+
----------------
echristo wrote:
> Do you need the declaration up here? Why not just pull the static function up if so?
That would clutter the changes for no good reason. Whenever bunch of code moved from one place to another, it's always a pain figuring out whether things were just copied or copied *and* changed. Forward declaration is a lesser crime, IMO.

================
Comment at: lib/Driver/Driver.cpp:1732
@@ +1731,3 @@
+        computeTargetTriple(DefaultTargetTriple, C.getArgs(), "");
+    llvm::Triple TargetTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"
+                                                       : "nvptx-nvidia-cuda");
----------------
echristo wrote:
> Probably would prefer "DeviceTriple" here.
Done.

================
Comment at: lib/Driver/Tools.cpp:2583-2588
@@ -2577,1 +2582,8 @@
+  assert(Inputs.size() >= 1 && "Must have at least one input.");
+  InputInfoList BaseInputs; // Inputs[0]
+  const InputInfo &Input = Inputs[0];
+  BaseInputs.push_back(Input);
+  bool IsCuda = types::isCuda(Input.getType());
+
+  assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple inputs.");
 
----------------
echristo wrote:
> Comment about what's going on here.
Done.

================
Comment at: lib/Driver/Tools.cpp:2696
@@ -2683,3 +2695,3 @@
   CmdArgs.push_back("-main-file-name");
-  CmdArgs.push_back(getBaseInputName(Args, Inputs));
+  CmdArgs.push_back(getBaseInputName(Args, Input));
 
----------------
echristo wrote:
> Might be nice to pull this sort of change out so it isn't affecting the rest of the diff.
Sure. I was also thinking of splitting code generation into a separate commit as well as it's largely independent of the driver changes.

================
Comment at: test/CodeGenCUDA/device-stub.cu:40-46
@@ +39,9 @@
+// Test that we've built contructor..
+// CHECK: define internal void @.cuda_module_ctor
+//   .. that calls __cudaRegisterFatBinary(&.cuda_fatbin_wrapper)
+// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}.cuda_fatbin_wrapper
+//   .. stores return value in .cuda_gpubin_handle
+// CHECK: store{{.*}}.cuda_gpubin_handle
+//   .. and then calls .cuda_register_kernels
+// CHECK: call void @.cuda_register_kernels
+
----------------
echristo wrote:
> Should some of these be CHECK-NEXT?
Some. Changed to CHECK-NEXT where it was possible.

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/