[PATCH] D16082: [CUDA] Invoke ptxas and fatbinary during compilation.

Mon Jan 11 14:32:10 PST 2016

tra added a comment.

Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.


================
Comment at: lib/Driver/Driver.cpp:1380
@@ +1379,3 @@
+      C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN),
+      /* GpuArchName = */ nullptr,
+      /* AtTopLevel = */ false);
----------------
So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin?
Perhaps it warrants its own CudaDeviceLinkAction?

================
Comment at: lib/Driver/Driver.cpp:1620-1625
@@ -1602,3 +1619,8 @@
   case phases::Assemble:
-    return C.MakeAction<AssembleJobAction>(Input, types::TY_Object);
+    // Assembling generates an object file, except when we're assembling CUDA,
+    // in which case we get a cubin file.
+    return C.MakeAction<AssembleJobAction>(
+        std::move(Input), TC.getTriple().getOS() == llvm::Triple::CUDA
+                              ? types::TY_CUDA_CUBIN
+                              : types::TY_Object);
   }
----------------
cubin *is* an ELF object file. Do we really need a new type here or can we get by with TY_Object?

================
Comment at: lib/Driver/ToolChains.cpp:4193
@@ +4192,3 @@
+    : Linux(D, Triple, Args) {
+  if (CudaInstallation.isValid()) {
+    getProgramPaths().push_back(CudaInstallation.getBinPath());
----------------
unneeded {} here and in few more places throughout the patch.

================
Comment at: lib/Driver/ToolChains.cpp:4230
@@ -4224,3 +4229,3 @@
       // Skip this argument unless the architecture matches BoundArch
-      if (A->getValue(0) != StringRef(BoundArch))
+      if (!BoundArch || A->getValue(0) != StringRef(BoundArch))
         continue;
----------------
You may as well move it out of the loop and return early if BoundArch is nullptr.

================
Comment at: lib/Driver/Tools.cpp:10621-10625
@@ +10620,7 @@
+  ArgStringList CmdArgs;
+  if (TC.getTriple().isArch64Bit()) {
+    CmdArgs.push_back("-m64");
+  } else {
+    CmdArgs.push_back("-m32");
+  }
+
----------------
CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32");

or, even

ArgStringList CmdArgs = {TC.getTriple().isArch64Bit() ? "-m64" : "-m32"};

Same in Linker::ConstructJob below.

================
Comment at: lib/Driver/Tools.cpp:10677-10678
@@ +10676,4 @@
+    std::string Arch = A->getGpuArchName();
+    // We need to name pass an Arch of the form "sm_XX" for cubin files and
+    // "compute_XX" for ptx.  CudaDeviceAction's getGpuArchName() is guaranteed
+    // to match "sm_XX".
----------------
First line does not parse.

In general compute_XX does not necessarily match sm_XX.
[[ http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options | ptxas options ]] says that 

> Allowed values for this option: compute_20, compute_30, compute_35, compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52

Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map.



================
Comment at: lib/Driver/Tools.h:923
@@ +922,3 @@
+
+// Runs fatbinary, which is sort of a linker for NVPTX.
+class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
----------------
Please add more details about what fatbin does.
".. which combines GPU object files and, optionally, PTX assembly into a single output file."


http://reviews.llvm.org/D16082