[PATCH] D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver.

Tue May 10 15:09:36 PDT 2022

jhuber6 added inline comments.

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:726
+
+  if (Error Err = executeCommands(*FatBinaryPath, CmdArgs))
+    return std::move(Err);
----------------
tra wrote:
> We should have a way to pass extra options to fatbinary, too. E.g. we may want to use `--compress-all`. Also, we may need to pass through `-g` for debug builds.
> 
> Oh. Debug builds. Makes me wonder if cuda-gdb will be able to find GPU binaries packaged by the new driver. If it does not, it will be a rather serious problem. It would likely affect various profiling tools the same way, too. Can you give it a try?
I was planning on implementing this stuff more generally when we get the new binary tool in D125165 landed. That will allow me to more generally put any number of command line arguments into the binary itself and fish it out here. We already support a janky version for the -Xcuda-ptxas option here, but it's a mess and I'm planning on getting rid of it. Is it okay to punt that into the future? Debug builds are another sore point I don't handle super well right now but will be addressed better with D125165. 

I haven't tested cuda-gdb, but I embed the fatbinary the same way that we do in non-rdc mode. I can read them with cuobjdump in the final executable so I'm assuming it's compatible.

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1192
     // Run LTO on any bitcode files and replace the input with the result.
-    if (Error Err = linkBitcodeFiles(LinkerInput.getSecond(), TheTriple,
-                                     File.Arch, WholeProgram))
+    if (Error Err = linkBitcodeFiles(LinkerInputFiles, TheTriple, File.Arch,
+                                     WholeProgram))
----------------
tra wrote:
> Nit. We should think of changing `linkBitcodeFiles` to return `llvm::ErrorOr<bool>` so we can return `WholeArchive` value, instead of modifying it as an argument.
That's a good idea, I'm planning on cleaning a lot of this stuff up later.

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1200
+      LinkedImages.emplace_back(OFK_OpenMP, TheTriple.getTriple(), File.Arch,
+                                LinkerInputFiles.front());
       continue;
----------------
tra wrote:
> What's expected to happen if we have more than one input?
> If only one is ever expected, I'd add an assert.
This isn't used right now since JIT hasn't made it in. It should probably be a proper error honestly.

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1206
     // CUDA in non-RDC mode.
     if (WholeProgram && TheTriple.isNVPTX()) {
+      assert(!LinkerInputFiles.empty() && "No non-RDC image to embed");
----------------
tra wrote:
> Should `EmbedBitcode` be mutually exclusive vs `WholeArchive`? 
> 
> Can we ever end up with both unset? 
`EmbedBitcode` might need to require `WholeArchive` considering that it's supposed to be a completely linked image that can be sent to a JIT engine. 

================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1347
+
+    auto FileOrErr = compileModule(M);
+    if (!FileOrErr)
----------------
tra wrote:
> The `M` contains generated registration glue at this point, right? It may be worth a comment to explain what is it that we're compiling here.
Sure.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123810/new/

https://reviews.llvm.org/D123810