[PATCH] End-to-end CUDA compilation.

Fri Apr 3 14:30:04 PDT 2015

Thanks for the updated description.

Here's an initial round of comments. I'm leaving the driver parts mostly to the driver experts.


================
Comment at: include/clang/Driver/Action.h:140
@@ +139,3 @@
+  virtual void anchor();
+  /// GPU architecture to bind
+  const char *GpuArchName;
----------------
Can you give an example in this comment? like sm_30, etc.

================
Comment at: include/clang/Driver/Action.h:144
@@ +143,3 @@
+public:
+  CudaDeviceAction(std::unique_ptr<Action> Input, const char *_ArchName);
+
----------------
IIRC _[A-Z] names are discouraged, and against the style anyway

================
Comment at: include/clang/Driver/CC1Options.td:611
@@ -610,1 +610,3 @@
 
+def cuda_include_gpucode : Separate<["-"], "cuda-include-gpucode">,
+  HelpText<"Incorporate CUDA device-side code.">;
----------------
I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then?

PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow.

[this applies to other uses of "gpucode" too]

================
Comment at: include/clang/Driver/Options.td:456
@@ -455,1 +455,3 @@
 def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;
+def fcuda_no_device : Flag<["-"], "fcuda-no-device">,
+  HelpText<"Disable device-side CUDA compilation">;
----------------
Is it possible to make these flags positive, with false-by-default values?



================
Comment at: include/clang/Driver/Options.td:1074
@@ +1073,3 @@
+  HelpText<"CUDA GPU architecture">;
+def gpu_architecture_EQ : Joined<["--"], "gpu-architecture=">,
+  Flags<[DriverOption]>, Alias<gpu_architecture>;
----------------
What is this for?

================
Comment at: include/clang/Frontend/CodeGenOptions.h:164
@@ +163,3 @@
+  /// List of CUDA GPU code blobs to incorporate
+  std::vector<std::string> CudaGpuCodeFiles;
+
----------------
s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode"

================
Comment at: lib/CodeGen/CGCUDANV.cpp:47
@@ -37,1 +46,3 @@
   llvm::Constant *getLaunchFn() const;
+  llvm::Constant *getRegisterFunctionFn() const;
+
----------------
Put doc comments for the new functions/methods you're adding, and preferably for the data fields as well, unless they're completely obvious

================
Comment at: lib/CodeGen/CGCUDANV.cpp:87
@@ +86,3 @@
+
+  Zeros[0] = llvm::ConstantInt::get(SizeTy, 0);
+  Zeros[1] = Zeros[0];
----------------
Do you really need Zeros as a member? You only use it once. Also, if you just declare it you can use the nice C++11 {...} initializer in the place of use, making the code even shorter.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:116
@@ -71,2 +115,3 @@
   std::vector<llvm::Type*> Params;
+  Params.push_back(VoidPtrPtrTy);
   Params.push_back(CharPtrTy);
----------------
Use C++11 {...} initialization?

================
Comment at: lib/CodeGen/CGCUDANV.cpp:177
@@ +176,3 @@
+// void .cuda_register_kernels(void** GpuBlobHandle) {
+//  // for (Kernel : EmittedKernels) {
+//    __cudaRegisterFunction(GpuBlobHandle,Kernel)
----------------
If this is pseudocode example, second level of // comments is superfluous

================
Comment at: lib/CodeGen/CGCUDANV.cpp:193
@@ +192,3 @@
+  llvm::Argument &BlobHandlePtr = *RegisterKernelsFunc->arg_begin();
+  for (llvm::Function *Kernel : EmittedKernels) {
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
----------------
const?

================
Comment at: lib/CodeGen/CGCUDANV.cpp:195
@@ +194,3 @@
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
+    llvm::Value *args[] = {
+        &BlobHandlePtr, // pointer to fatbin handler pointer
----------------
Can you document the C signature of the called function somewhere for clarity?

================
Comment at: lib/CodeGen/CGCUDANV.cpp:197
@@ +196,3 @@
+        &BlobHandlePtr, // pointer to fatbin handler pointer
+        // Builder.CreatePointerCast(nullptr, CharPtrTy), // kernel stub addr
+        Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr
----------------
leftovers?

================
Comment at: lib/CodeGen/CGCUDANV.cpp:219
@@ +218,3 @@
+// void .cuda_module_ctor(void*) {
+//   // for (GpuCodeBlob : GpuCodeBlobs) {
+//          Handle = __cudaRegisterFatBinary(GpuCodeBlob);
----------------
same here re second-level //

================
Comment at: lib/CodeGen/CGCUDANV.cpp:251
@@ +250,3 @@
+    // Create initialized wrapper structure that points to the loaded GPU blob.
+    llvm::Constant *Values[4] = {
+        llvm::ConstantInt::get(IntTy, FatbinWrapperMagic),
----------------
Is the 4 in [4] needed?

================
Comment at: lib/CodeGen/CGCUDARuntime.h:42
@@ -34,1 +41,3 @@
 
+  llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
+  llvm::SmallVector<llvm::GlobalVariable *, 16> FatbinHandles;
----------------
It would really be great not to have data inside this abstract interface; is this necessary?

Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :)

================
Comment at: lib/CodeGen/CGCUDARuntime.h:53
@@ +52,3 @@
+
+  virtual void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args);
+
----------------
Please document these APIs

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/