[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jun 22 14:38:09 PDT 2022
tra added a comment.
In D127901#3602771 <https://reviews.llvm.org/D127901#3602771>, @jdoerfert wrote:
> Do we want/need PTX, I do not, but I don't mind having it. Someone will ask for it eventually.
> However, if we embed bitcode via LTO we can use the
> single-linked PTX image for the whole module and include it in the
> fatbinary. This allows us to do the following and have it execute even
> without the correct architecture specified.
> `clang foo.cu -foffload-lto -fgpu-rdc --offload-new-driver -lcudart`
Then we do need a knob controlling whether we do want to embed PTX or not. The default should be "off" IMO.
We currently have `--[no-]cuda-include-ptx=` we may reuse for that purpose.
This brings another question -- which GPU variant will we generate PTX for? One? All (if more than one is specified)? The ones specified by `--[no-]cuda-include-ptx=` ?
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the cfe-commits