[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

Tue Apr 26 12:09:18 PDT 2022

jhuber6 added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224
   if (IsCuda || IsHIP) {
-    if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
+    if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) ||
+        Args.hasArg(options::OPT_foffload_new_driver))
       CmdArgs.push_back("-fgpu-rdc");
----------------
rnk wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > jhuber6 wrote:
> > > > > > tra wrote:
> > > > > > > If user specifies both `-fno-gpu-rdc` and `-foffload-new-driver` we would still enable RDC compilation.
> > > > > > > We may want to at least issue a warning. 
> > > > > > > 
> > > > > > > Considering that  we have multiple places where we may check for `-f[no]gpu-rdc` we should make sure we don't get different ideas whether RDC has been enabled.
> > > > > > > I think it may make sense to provide a common way to figure it out. Either via a helper function that would process CLI arguments or calculate it once and save it somewhere.
> > > > > > I haven't quite finalized how to handle this. The new driver should be compatible with a non-RDC build since we simply wouldn't embed the device image or create offloading entries. It's a little bit more difficult here since the new method is opt-in so it requires a flag. We should definitely emit a warning if both are enabled (I'm assuming there's one for passing both `fgpu-rdc` and `fno-gpu-rdc`). I'll add one in.
> > > > > > 
> > > > > > Also we could consider the new driver *the* RDC in the future which would be the easiest. The problem is if we want to support CUDA's method of RDC considering how other build systems seem to expect it. I could see us embedding the fatbinary in the object file, even if unused, just so that cuobjdump works. However we couldn't support the generation of `__cudaRegisterFatBinary_nv....` functions because then those would cause linker errors. WDYT?
> > > > > > I'm assuming there's one for passing both fgpu-rdc and fno-gpu-rdc
> > > > > 
> > > > > This is not a valid assumption. The whole idea behind `-fno-something` is that the options can be overridden. E.g. if the build specifies a standard set of compiler options, but we need to override some of them when building a particular file. We can only do so by appending to the standard options. Potentially we may end up having those options overridden again. While it's not very common, it's certainly possible. It's also possible to start with '-fno-gpu-rdc' and then override it with `-fgpu-rdc`.
> > > > > 
> > > > > In this case, we care about the final state of RDC specified by -f*gpu-rdc options, not by the fact that `-fno-gpu-rdc` is present. `Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)` does exactly that -- gives you the final state. If it returns false, but we have `-foffload-new-driver`, then we need a warning as these options are contradictory.
> > > > > 
> > > > > The new driver should be compatible with a non-RDC build 
> > > > 
> > > > In that case, we don't need a warning, but we do need a test verifying this behavior.
> > > > 
> > > > > Also we could consider the new driver *the* RDC in the future which would be the easiest. 
> > > > 
> > > > SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it.
> > > > Eventually we would deprecate RDC options, but we still need to work sensibly when user specifies a mix of these options. 
> > > > 
> > > > 
> > > > In that case, we don't need a warning, but we do need a test verifying this behavior.
> > > > 
> > > It's possible but I don't have the logic here to do it, figured we can cross that bridge later.
> > > 
> > > > SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it.
> > > >
> > > So the only downsides I know of, is that we don't currently replicate CUDA's magic to JIT RDC code (We can do this with LTO anyway), and that registering offload entries relies on the linker defining `__start / __stop` variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that.
> > > relies on the linker defining __start / __stop variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that.
> > 
> > @MaskRay, @rnk - would you happen to know the answer?
> I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the __start / __stop symbols are defined on Windows using various pragmas and `__declspec(allocate)` to set up sections and sort them accordingly.
> 
> I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere.
> I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the start / stop symbols are defined on Windows using various pragmas and __declspec(allocate) to set up sections and sort them accordingly.
> I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere.
Thanks for the information, I was having a hard to figuring out if it was possible to implement this on other platforms. Some documentation for handling this on each platform would definitely be useful as I am hoping this can become the standard way to compile / register offloading languages in LLVM. Let me know if I can do anything to help on that front.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120272/new/

https://reviews.llvm.org/D120272