[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Wed Jun 15 12:57:01 PDT 2022

jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, JonChesterfield, tra, yaxunl.
Herald added subscribers: mattd, gchakrabarti, asavonic, inglorion.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

One current downside of the LLVM support for CUDA in RDC-mode is that we
cannot JIT off of the PTX image. This requires the user to provide the
specific architecture when offloading. CUDA's runtime uses a special
method to link the separate PTX files when in RDC-mode, while LLVM
cannot do this with the chosen approach to supporting RDC-mode
compilation. However, if we embed bitcode via LTO we can use the
single-linked PTX image for the whole module and include it in the
fatbinary. This allows us to do the following and have it execute even
without the correct architecture specified.

  clang foo.cu -foffload-lto -fgpu-rdc --offload-new-driver -lcudart

It is also worth noting that in full-LTO mode, RDC-mode will behave
exactly like non-RDC mode after linking.

Depends on D127246 <https://reviews.llvm.org/D127246>

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D127901

Files:
  clang/test/Driver/linker-wrapper.c
  clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D127901.437306.patch
Type: text/x-patch
Size: 5354 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220615/80d77c4f/attachment-0001.bin>