[PATCH] End-to-end CUDA compilation.

Mon Apr 6 11:04:59 PDT 2015

================
Comment at: include/clang/Driver/Action.h:140
@@ +139,3 @@
+  virtual void anchor();
+  /// GPU architecture to bind
+  const char *GpuArchName;
----------------
eliben wrote:
> Can you give an example in this comment? like sm_30, etc.
Done

================
Comment at: include/clang/Driver/Action.h:144
@@ +143,3 @@
+public:
+  CudaDeviceAction(std::unique_ptr<Action> Input, const char *_ArchName);
+
----------------
eliben wrote:
> IIRC _[A-Z] names are discouraged, and against the style anyway
Done.

================
Comment at: include/clang/Driver/CC1Options.td:611
@@ -610,1 +610,3 @@
 
+def cuda_include_gpucode : Separate<["-"], "cuda-include-gpucode">,
+  HelpText<"Incorporate CUDA device-side code.">;
----------------
eliben wrote:
> I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then?
> 
> PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow.
> 
> [this applies to other uses of "gpucode" too]
It's actually an opaque blob. clang does not care what's in the file as it just passes the bits to cudart which passes it to the driver. The driver can digest PTX (which we pass in this case), but it will as happily accept GPU code packed in fatbin or cubin formats. If/when we grow ability to compile device-side to SASS, we would just  do "-cuda-include-gpucode gpu-code-packed-in.cubin" and it should work with no other changes on the host side.

So, 'gpucode' was the best approximation I could come up with that would keep "GPU code in any shape or form as long as it's PTX/fatbin or cubin".

I'd be happy to change it. Suggestions?

================
Comment at: include/clang/Driver/Options.td:456
@@ -455,1 +455,3 @@
 def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;
+def fcuda_no_device : Flag<["-"], "fcuda-no-device">,
+  HelpText<"Disable device-side CUDA compilation">;
----------------
eliben wrote:
> Is it possible to make these flags positive, with false-by-default values?
> 
> 
Sure. Changed the options to -fcuda-host-only/-fcuda-device-only




================
Comment at: include/clang/Driver/Options.td:1074
@@ +1073,3 @@
+  HelpText<"CUDA GPU architecture">;
+def gpu_architecture_EQ : Joined<["--"], "gpu-architecture=">,
+  Flags<[DriverOption]>, Alias<gpu_architecture>;
----------------
eliben wrote:
> What is this for?
I've added for (partial) compatibility with nvcc. I've removed it for now as drop-in nvcc compatibility is not the purpose of this patch.

================
Comment at: include/clang/Frontend/CodeGenOptions.h:164
@@ +163,3 @@
+  /// List of CUDA GPU code blobs to incorporate
+  std::vector<std::string> CudaGpuCodeFiles;
+
----------------
eliben wrote:
> s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode"
It's a vector of strings containing names of files that contain GPU code blobs, whatever their format may be. I'll rename the variable to CudaGpuCodeFileNames and will update the comment to reflect that.

How about this?
  /// A list of file names passed with -cuda-include-gpucode options to forward
  /// to CUDA runtime back-end for incorporating them into host-side object
  /// file.
  std::vector<std::string> CudaGpuCodeFileNames;



================
Comment at: lib/CodeGen/CGCUDANV.cpp:47
@@ -37,1 +46,3 @@
   llvm::Constant *getLaunchFn() const;
+  llvm::Constant *getRegisterFunctionFn() const;
+
----------------
eliben wrote:
> Put doc comments for the new functions/methods you're adding, and preferably for the data fields as well, unless they're completely obvious
Moved some fields out of the class and into local variables where they are used.
Documented the rest.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:87
@@ +86,3 @@
+
+  Zeros[0] = llvm::ConstantInt::get(SizeTy, 0);
+  Zeros[1] = Zeros[0];
----------------
eliben wrote:
> Do you really need Zeros as a member? You only use it once. Also, if you just declare it you can use the nice C++11 {...} initializer in the place of use, making the code even shorter.
Done. Also moved number of other things with single use down to where they are used.


================
Comment at: lib/CodeGen/CGCUDANV.cpp:116
@@ -71,2 +115,3 @@
   std::vector<llvm::Type*> Params;
+  Params.push_back(VoidPtrPtrTy);
   Params.push_back(CharPtrTy);
----------------
eliben wrote:
> Use C++11 {...} initialization?
OK.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:177
@@ +176,3 @@
+// void .cuda_register_kernels(void** GpuBlobHandle) {
+//  // for (Kernel : EmittedKernels) {
+//    __cudaRegisterFunction(GpuBlobHandle,Kernel)
----------------
eliben wrote:
> If this is pseudocode example, second level of // comments is superfluous
The idea I wanted to convey is that I'm not really generating the loop, but rather rather generate a call for each kernel, in effect unrolling the loop. 

I've changed pseudocode to linear sequence of calls which is what those functions really generate.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:193
@@ +192,3 @@
+  llvm::Argument &BlobHandlePtr = *RegisterKernelsFunc->arg_begin();
+  for (llvm::Function *Kernel : EmittedKernels) {
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
----------------
eliben wrote:
> const?
Nope. CreateBitCast wants non-const Function*:

../../../tools/clang/lib/CodeGen/CGCUDANV.cpp:198:31: error: cannot initialize a parameter of type 'llvm::Value *' with an lvalue of type 'const llvm::Function *'
        Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr


================
Comment at: lib/CodeGen/CGCUDANV.cpp:195
@@ +194,3 @@
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
+    llvm::Value *args[] = {
+        &BlobHandlePtr, // pointer to fatbin handler pointer
----------------
eliben wrote:
> Can you document the C signature of the called function somewhere for clarity?
I've moved CreateRuntimeFunction(...,"__cudaRegisterFunction") along with its signature in the comments into makeRegisterKernelsFn, so it should be visible close to where it's used.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:197
@@ +196,3 @@
+        &BlobHandlePtr, // pointer to fatbin handler pointer
+        // Builder.CreatePointerCast(nullptr, CharPtrTy), // kernel stub addr
+        Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr
----------------
eliben wrote:
> leftovers?
Yes. Removed.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:219
@@ +218,3 @@
+// void .cuda_module_ctor(void*) {
+//   // for (GpuCodeBlob : GpuCodeBlobs) {
+//          Handle = __cudaRegisterFatBinary(GpuCodeBlob);
----------------
eliben wrote:
> same here re second-level //
Fixed.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:251
@@ +250,3 @@
+    // Create initialized wrapper structure that points to the loaded GPU blob.
+    llvm::Constant *Values[4] = {
+        llvm::ConstantInt::get(IntTy, FatbinWrapperMagic),
----------------
eliben wrote:
> Is the 4 in [4] needed?
Not really. Removed.

================
Comment at: lib/CodeGen/CGCUDARuntime.h:42
@@ -34,1 +41,3 @@
 
+  llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
+  llvm::SmallVector<llvm::GlobalVariable *, 16> FatbinHandles;
----------------
eliben wrote:
> It would really be great not to have data inside this abstract interface; is this necessary?
> 
> Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :)
List of generated kernels is something that I expect to be useful for all subclasses of CUDARuntime. 
That's why I've put EmittedKernels there and a non-virtual methodEmitDeviceStub() to populate it.

FatbinHandles, on the other hand, is indeed cudart-specific. I've moved it into CGCUDANV.

================
Comment at: lib/CodeGen/CGCUDARuntime.h:53
@@ +52,3 @@
+
+  virtual void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args);
+
----------------
eliben wrote:
> Please document these APIs
Done.

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/