[PATCH] End-to-end CUDA compilation.

Artem Belevich tra at google.com
Mon Apr 6 15:25:39 PDT 2015


================
Comment at: include/clang/Driver/CC1Options.td:611
@@ -610,1 +610,3 @@
 
+def cuda_include_gpucode : Separate<["-"], "cuda-include-gpucode">,
+  HelpText<"Incorporate CUDA device-side code.">;
----------------
eliben wrote:
> tra wrote:
> > eliben wrote:
> > > I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit then?
> > > 
> > > PTX is probably not too specific since this flag begins with "cuda_" so it's already about the CUDA/PTX flow.
> > > 
> > > [this applies to other uses of "gpucode" too]
> > It's actually an opaque blob. clang does not care what's in the file as it just passes the bits to cudart which passes it to the driver. The driver can digest PTX (which we pass in this case), but it will as happily accept GPU code packed in fatbin or cubin formats. If/when we grow ability to compile device-side to SASS, we would just  do "-cuda-include-gpucode gpu-code-packed-in.cubin" and it should work with no other changes on the host side.
> > 
> > So, 'gpucode' was the best approximation I could come up with that would keep "GPU code in any shape or form as long as it's PTX/fatbin or cubin".
> > 
> > I'd be happy to change it. Suggestions?
> I see - some generic mnemonic is needed, I agree (so PTX is not a good idea). But "--gpu-code" is a nvcc flag that means something completely different :-/ So "gpu code" here may still be confusing. Maybe "gpublob" or "gpuobject" or "gpubinary" or something like that. I can't think of a perfect solution right now.
> 
>  I'll leave it to your discretion.
gpubinary wins.

================
Comment at: lib/CodeGen/CGCUDARuntime.h:42
@@ -34,1 +41,3 @@
 
+  llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
+  llvm::SmallVector<llvm::GlobalVariable *, 16> FatbinHandles;
----------------
eliben wrote:
> tra wrote:
> > eliben wrote:
> > > It would really be great not to have data inside this abstract interface; is this necessary?
> > > 
> > > Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though this interface is allegedly generic :)
> > List of generated kernels is something that I expect to be useful for all subclasses of CUDARuntime. 
> > That's why I've put EmittedKernels there and a non-virtual methodEmitDeviceStub() to populate it.
> > 
> > FatbinHandles, on the other hand, is indeed cudart-specific. I've moved it into CGCUDANV.
> I would still remove EmittedKernels for now; we only have a single CUDA runtime at this time in upstream, so this feels redundant, as it makes the runtime interface / implementation barrier less clean than it should be. In the future if/when new runtime implementations are added, we'll figure out what's the best way to factor common code out is. 
> 
> YAGNI, essentially :)
OK.

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the cfe-commits mailing list