[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Wed Jun 22 14:38:09 PDT 2022

tra added a comment.

In D127901#3602771 <https://reviews.llvm.org/D127901#3602771>, @jdoerfert wrote:

> Do we want/need PTX, I do not, but I don't mind having it. Someone will ask for it eventually.

Fair enough.

> However, if we embed bitcode via LTO we can use the
> single-linked PTX image for the whole module and include it in the
> fatbinary. This allows us to do the following and have it execute even
> without the correct architecture specified.
> `clang foo.cu -foffload-lto -fgpu-rdc --offload-new-driver -lcudart`

Then we do need a knob controlling whether we do want to embed PTX or not. The default should be "off" IMO.
We currently have `--[no-]cuda-include-ptx=` we may reuse for that purpose.

This brings another question -- which GPU variant will we generate PTX for? One? All (if more than one is specified)? The ones specified by `--[no-]cuda-include-ptx=` ?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127901/new/

https://reviews.llvm.org/D127901