[PATCH] End-to-end CUDA compilation.
Eli Bendersky
eliben at google.com
Fri Apr 3 14:30:04 PDT 2015
Thanks for the updated description.
Here's an initial round of comments. I'm leaving the driver parts mostly to the driver experts.
================
Comment at: include/clang/Driver/Action.h:140
@@ +139,3 @@
+ virtual void anchor();
+ /// GPU architecture to bind
+ const char *GpuArchName;
----------------
Can you give an example in this comment? like sm_30, etc.
================
Comment at: include/clang/Driver/Action.h:144
@@ +143,3 @@
+public:
+ CudaDeviceAction(std::unique_ptr<Action> Input, const char *_ArchName);
+
----------------
IIRC _[A-Z] names are discouraged, and against the style anyway
================
Comment at: include/clang/Driver/CC1Options.td:611
@@ -610,1 +610,3 @@
+def cuda_include_gpucode : Separate<["-"], "cuda-include-gpucode">,
+ HelpText<"Incorporate CUDA device-side code.">;
----------------
I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then?
PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow.
[this applies to other uses of "gpucode" too]
================
Comment at: include/clang/Driver/Options.td:456
@@ -455,1 +455,3 @@
def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;
+def fcuda_no_device : Flag<["-"], "fcuda-no-device">,
+ HelpText<"Disable device-side CUDA compilation">;
----------------
Is it possible to make these flags positive, with false-by-default values?
================
Comment at: include/clang/Driver/Options.td:1074
@@ +1073,3 @@
+ HelpText<"CUDA GPU architecture">;
+def gpu_architecture_EQ : Joined<["--"], "gpu-architecture=">,
+ Flags<[DriverOption]>, Alias<gpu_architecture>;
----------------
What is this for?
================
Comment at: include/clang/Frontend/CodeGenOptions.h:164
@@ +163,3 @@
+ /// List of CUDA GPU code blobs to incorporate
+ std::vector<std::string> CudaGpuCodeFiles;
+
----------------
s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than "GpuCode"
================
Comment at: lib/CodeGen/CGCUDANV.cpp:47
@@ -37,1 +46,3 @@
llvm::Constant *getLaunchFn() const;
+ llvm::Constant *getRegisterFunctionFn() const;
+
----------------
Put doc comments for the new functions/methods you're adding, and preferably for the data fields as well, unless they're completely obvious
================
Comment at: lib/CodeGen/CGCUDANV.cpp:87
@@ +86,3 @@
+
+ Zeros[0] = llvm::ConstantInt::get(SizeTy, 0);
+ Zeros[1] = Zeros[0];
----------------
Do you really need Zeros as a member? You only use it once. Also, if you just declare it you can use the nice C++11 {...} initializer in the place of use, making the code even shorter.
================
Comment at: lib/CodeGen/CGCUDANV.cpp:116
@@ -71,2 +115,3 @@
std::vector<llvm::Type*> Params;
+ Params.push_back(VoidPtrPtrTy);
Params.push_back(CharPtrTy);
----------------
Use C++11 {...} initialization?
================
Comment at: lib/CodeGen/CGCUDANV.cpp:177
@@ +176,3 @@
+// void .cuda_register_kernels(void** GpuBlobHandle) {
+// // for (Kernel : EmittedKernels) {
+// __cudaRegisterFunction(GpuBlobHandle,Kernel)
----------------
If this is pseudocode example, second level of // comments is superfluous
================
Comment at: lib/CodeGen/CGCUDANV.cpp:193
@@ +192,3 @@
+ llvm::Argument &BlobHandlePtr = *RegisterKernelsFunc->arg_begin();
+ for (llvm::Function *Kernel : EmittedKernels) {
+ llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
----------------
const?
================
Comment at: lib/CodeGen/CGCUDANV.cpp:195
@@ +194,3 @@
+ llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
+ llvm::Value *args[] = {
+ &BlobHandlePtr, // pointer to fatbin handler pointer
----------------
Can you document the C signature of the called function somewhere for clarity?
================
Comment at: lib/CodeGen/CGCUDANV.cpp:197
@@ +196,3 @@
+ &BlobHandlePtr, // pointer to fatbin handler pointer
+ // Builder.CreatePointerCast(nullptr, CharPtrTy), // kernel stub addr
+ Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr
----------------
leftovers?
================
Comment at: lib/CodeGen/CGCUDANV.cpp:219
@@ +218,3 @@
+// void .cuda_module_ctor(void*) {
+// // for (GpuCodeBlob : GpuCodeBlobs) {
+// Handle = __cudaRegisterFatBinary(GpuCodeBlob);
----------------
same here re second-level //
================
Comment at: lib/CodeGen/CGCUDANV.cpp:251
@@ +250,3 @@
+ // Create initialized wrapper structure that points to the loaded GPU blob.
+ llvm::Constant *Values[4] = {
+ llvm::ConstantInt::get(IntTy, FatbinWrapperMagic),
----------------
Is the 4 in [4] needed?
================
Comment at: lib/CodeGen/CGCUDARuntime.h:42
@@ -34,1 +41,3 @@
+ llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
+ llvm::SmallVector<llvm::GlobalVariable *, 16> FatbinHandles;
----------------
It would really be great not to have data inside this abstract interface; is this necessary?
Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :)
================
Comment at: lib/CodeGen/CGCUDARuntime.h:53
@@ +52,3 @@
+
+ virtual void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args);
+
----------------
Please document these APIs
http://reviews.llvm.org/D8463
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the cfe-commits
mailing list