[PATCH] D123810: [Cuda] Add initial support for wrapping CUDA images in the new driver.
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue May 10 14:52:42 PDT 2022
tra added inline comments.
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:188
+ static inline OffloadKind getEmptyKey() {
+ return static_cast<OffloadKind>(0xFFFF);
+ }
----------------
Extend `enum OffloadKind` to include these special kinds.
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:201
static DeviceFile getEmptyKey() {
- return {DenseMapInfo<StringRef>::getEmptyKey(),
+ return {static_cast<OffloadKind>(DenseMapInfo<uint16_t>::getEmptyKey()),
DenseMapInfo<StringRef>::getEmptyKey(),
----------------
Why do we limit ourselves to uint16_t here? Can't we just use `OffloadKind` itself and get rid of these casts?
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:726
+
+ if (Error Err = executeCommands(*FatBinaryPath, CmdArgs))
+ return std::move(Err);
----------------
We should have a way to pass extra options to fatbinary, too. E.g. we may want to use `--compress-all`. Also, we may need to pass through `-g` for debug builds.
Oh. Debug builds. Makes me wonder if cuda-gdb will be able to find GPU binaries packaged by the new driver. If it does not, it will be a rather serious problem. It would likely affect various profiling tools the same way, too. Can you give it a try?
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1192
// Run LTO on any bitcode files and replace the input with the result.
- if (Error Err = linkBitcodeFiles(LinkerInput.getSecond(), TheTriple,
- File.Arch, WholeProgram))
+ if (Error Err = linkBitcodeFiles(LinkerInputFiles, TheTriple, File.Arch,
+ WholeProgram))
----------------
Nit. We should think of changing `linkBitcodeFiles` to return `llvm::ErrorOr<bool>` so we can return `WholeArchive` value, instead of modifying it as an argument.
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1200
+ LinkedImages.emplace_back(OFK_OpenMP, TheTriple.getTriple(), File.Arch,
+ LinkerInputFiles.front());
continue;
----------------
What's expected to happen if we have more than one input?
If only one is ever expected, I'd add an assert.
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1206
// CUDA in non-RDC mode.
if (WholeProgram && TheTriple.isNVPTX()) {
+ assert(!LinkerInputFiles.empty() && "No non-RDC image to embed");
----------------
Should `EmbedBitcode` be mutually exclusive vs `WholeArchive`?
Can we ever end up with both unset?
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1210
+ LinkedImages.emplace_back(Kind, TheTriple.getTriple(), File.Arch,
+ LinkerInputFiles.front());
continue;
----------------
ditto about the assert on the number of inputs.
================
Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1347
+
+ auto FileOrErr = compileModule(M);
+ if (!FileOrErr)
----------------
The `M` contains generated registration glue at this point, right? It may be worth a comment to explain what is it that we're compiling here.
================
Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:271
+llvm::Error wrapCudaBinary(llvm::Module &M, llvm::ArrayRef<char> Images) {
+ return Error::success();
+}
----------------
A "not-implemented-yet/TODO" comment would be appropriate here.
================
Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.h:20
+
+/// Wrap the input fatbinary image into the module \p M as global symbols and
+/// registers the images with the CUDA runtime.
----------------
It should be either "Wraps/registers" or "Wrap/register".
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123810/new/
https://reviews.llvm.org/D123810
More information about the cfe-commits
mailing list