[PATCH] D101630: [HIP] Fix device-only compilation

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 1 13:21:25 PDT 2021


yaxunl added a comment.

In D101630#2791734 <https://reviews.llvm.org/D101630#2791734>, @tra wrote:

> In D101630#2787714 <https://reviews.llvm.org/D101630#2787714>, @yaxunl wrote:
>
>> How does nvcc --genco behave when there are multiple GPU arch's? Does it output a fat binary containing multiple ISA's? Also, does it support device-only compilation for intermediate outputs?
>
> It does not allow multiple outputs for `-ptx` and `-cubin` compilations, same as clang behaves now:
>
>   $ ~/local/cuda-11.3/bin/nvcc -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -ptx foo.cu
>   nvcc fatal   : Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures
>
> NVCC does allow `-E` with multiple targets, but it does produce output for only *one* of them.
>
> NVCC does bundle outputs for multiple GPU variants if `-fatbin` is used.

I think for intermediate outputs e.g. preprocessor expansion, IR, and assembly, probably it makes sense not to bundle by default. However, for default action (emitting object), we need to bundle by default since it was the old behavior and existing HIP apps depend on that. Then we allow -fhip-bundle-device-output to override the default behavior.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630



More information about the cfe-commits mailing list