[PATCH] D16082: [CUDA] Invoke ptxas and fatbinary during compilation.

Mon Jan 11 16:55:40 PST 2016

jlebar added a comment.

In http://reviews.llvm.org/D16082#324138, @tra wrote:

> Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.


Thanks.  All of them worked except -fintegrated-as, which was causing us not to invoke ptxas.  Fixed, and added a test.


================
Comment at: lib/Driver/Driver.cpp:1380
@@ +1379,3 @@
+      C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN),
+      /* GpuArchName = */ nullptr,
+      /* AtTopLevel = */ false);
----------------
tra wrote:
> So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin?
> Perhaps it warrants its own CudaDeviceLinkAction?
It's a special case either way, but there are enough places that look for a CudaDeviceAction that I *think* this special case is cleaner than having a bunch of if (CudaDeviceAction || CudaDeviceLinkAction)s floating around.

================
Comment at: lib/Driver/Driver.cpp:1620-1625
@@ -1602,3 +1619,8 @@
   case phases::Assemble:
-    return C.MakeAction<AssembleJobAction>(Input, types::TY_Object);
+    // Assembling generates an object file, except when we're assembling CUDA,
+    // in which case we get a cubin file.
+    return C.MakeAction<AssembleJobAction>(
+        std::move(Input), TC.getTriple().getOS() == llvm::Triple::CUDA
+                              ? types::TY_CUDA_CUBIN
+                              : types::TY_Object);
   }
----------------
tra wrote:
> cubin *is* an ELF object file. Do we really need a new type here or can we get by with TY_Object?
Got rid of the extra type.  Note that this means that our intermediate objs will be named foo.o instead of foo.cubin, which isn't optimal, but I agree that it's simpler without the extra type.

================
Comment at: lib/Driver/Driver.cpp:1880
@@ +1879,3 @@
+    // retrieve the Input's GPU arch.
+    II.setAction(A);
+    return II;
----------------
echristo wrote:
> Weren't you just adding it as part of the InputInfo constructor in 16078?
Yes, but we're doing something slightly different here.  Suppose you have a CudaDeviceAction --> BackendAction.  What do you want the InputInfo's Action to be?  Without this change, it's the BackendAction.  But we really want it to be the CudaDeviceAction.  Thus this line.

I updated the comment in an attempt to clarify.

================
Comment at: lib/Driver/ToolChains.cpp:4230
@@ -4224,3 +4229,3 @@
       // Skip this argument unless the architecture matches BoundArch
-      if (A->getValue(0) != StringRef(BoundArch))
+      if (!BoundArch || A->getValue(0) != StringRef(BoundArch))
         continue;
----------------
tra wrote:
> You may as well move it out of the loop and return early if BoundArch is nullptr.
Tried this and discovered it's actually subtly different in a way that, thankfully, affects one of the tests I added.

if (!BoundArch) continue applies only if A->getOption().matches(options::OPT_Xarch__).  So we can't hoist it into an early return.

================
Comment at: lib/Driver/Tools.cpp:10677-10678
@@ +10676,4 @@
+    std::string Arch = A->getGpuArchName();
+    // We need to name pass an Arch of the form "sm_XX" for cubin files and
+    // "compute_XX" for ptx.  CudaDeviceAction's getGpuArchName() is guaranteed
+    // to match "sm_XX".
----------------
tra wrote:
> First line does not parse.
> 
> In general compute_XX does not necessarily match sm_XX.
> [[ http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options | ptxas options ]] says that 
> 
> > Allowed values for this option: compute_20, compute_30, compute_35, compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52
> 
> Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map.
> 
> 
Oh goodness, how awful.  Thank you for catching that.  Fixed.


http://reviews.llvm.org/D16082