[PATCH] D101630: [HIP] Fix device-only compilation

Tue Jun 1 11:57:20 PDT 2021

tra added a comment.

In D101630#2787714 <https://reviews.llvm.org/D101630#2787714>, @yaxunl wrote:

> How does nvcc --genco behave when there are multiple GPU arch's? Does it output a fat binary containing multiple ISA's? Also, does it support device-only compilation for intermediate outputs?

It does not allow multiple outputs for `-ptx` and `-cubin` compilations, same as clang behaves now:

  ~/local/cuda-11.3/bin/nvcc -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -ptx foo.cu
  nvcc fatal   : Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures

NVCC does allow `-E` with multiple targets, but it does produce output for only *one* of them.

NVCC does bundle outputs for multiple GPU variants if `-cubin` is used.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630