[PATCH] D16082: [CUDA] Invoke ptxas and fatbinary during compilation.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Mon Jan 11 16:55:40 PST 2016
jlebar added a comment.
In http://reviews.llvm.org/D16082#324138, @tra wrote:
> Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.
Thanks. All of them worked except -fintegrated-as, which was causing us not to invoke ptxas. Fixed, and added a test.
Comment at: lib/Driver/Driver.cpp:1380
@@ +1379,3 @@
+ C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN),
+ /* GpuArchName = */ nullptr,
+ /* AtTopLevel = */ false);
> So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin?
> Perhaps it warrants its own CudaDeviceLinkAction?
It's a special case either way, but there are enough places that look for a CudaDeviceAction that I *think* this special case is cleaner than having a bunch of if (CudaDeviceAction || CudaDeviceLinkAction)s floating around.
Comment at: lib/Driver/Driver.cpp:1620-1625
@@ -1602,3 +1619,8 @@
- return C.MakeAction<AssembleJobAction>(Input, types::TY_Object);
+ // Assembling generates an object file, except when we're assembling CUDA,
+ // in which case we get a cubin file.
+ return C.MakeAction<AssembleJobAction>(
+ std::move(Input), TC.getTriple().getOS() == llvm::Triple::CUDA
+ ? types::TY_CUDA_CUBIN
+ : types::TY_Object);
> cubin *is* an ELF object file. Do we really need a new type here or can we get by with TY_Object?
Got rid of the extra type. Note that this means that our intermediate objs will be named foo.o instead of foo.cubin, which isn't optimal, but I agree that it's simpler without the extra type.
Comment at: lib/Driver/Driver.cpp:1880
@@ +1879,3 @@
+ // retrieve the Input's GPU arch.
+ return II;
> Weren't you just adding it as part of the InputInfo constructor in 16078?
Yes, but we're doing something slightly different here. Suppose you have a CudaDeviceAction --> BackendAction. What do you want the InputInfo's Action to be? Without this change, it's the BackendAction. But we really want it to be the CudaDeviceAction. Thus this line.
I updated the comment in an attempt to clarify.
Comment at: lib/Driver/ToolChains.cpp:4230
@@ -4224,3 +4229,3 @@
// Skip this argument unless the architecture matches BoundArch
- if (A->getValue(0) != StringRef(BoundArch))
+ if (!BoundArch || A->getValue(0) != StringRef(BoundArch))
> You may as well move it out of the loop and return early if BoundArch is nullptr.
Tried this and discovered it's actually subtly different in a way that, thankfully, affects one of the tests I added.
if (!BoundArch) continue applies only if A->getOption().matches(options::OPT_Xarch__). So we can't hoist it into an early return.
Comment at: lib/Driver/Tools.cpp:10677-10678
@@ +10676,4 @@
+ std::string Arch = A->getGpuArchName();
+ // We need to name pass an Arch of the form "sm_XX" for cubin files and
+ // "compute_XX" for ptx. CudaDeviceAction's getGpuArchName() is guaranteed
+ // to match "sm_XX".
> First line does not parse.
> In general compute_XX does not necessarily match sm_XX.
> [[ http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options | ptxas options ]] says that
> > Allowed values for this option: compute_20, compute_30, compute_35, compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52
> Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map.
Oh goodness, how awful. Thank you for catching that. Fixed.
More information about the cfe-commits