[PATCH] D101630: [HIP] Fix device-only compilation

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 30 10:26:55 PDT 2021


yaxunl added a comment.

In D101630#2729573 <https://reviews.llvm.org/D101630#2729573>, @tra wrote:

> CUDA compilation currently errors out if `-o` is used when more than one output would be produced.
> E.g.
>
>   % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c -E 
>   #... preprocessed output from host and 2 GPU compilations is printed out
>   
>   % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c -E  -o foo.out
>   clang-13: error: cannot specify -o when generating multiple output files
>   
>   % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c --cuda-device-only -E  -o foo.out
>   clang-13: error: cannot specify -o when generating multiple output files
>   
>   % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c --cuda-device-only -S  -o foo.out
>   clang-13: error: cannot specify -o when generating multiple output files
>
> I think I've borrowed that behavior from some of the macos-related functionality, so we do have a somewhat established model of how to handle multiple outputs.
> Wrapping multiple outputs into a single bundle could be an option too.
>
> The question is -- what would make most sense.
> Are bundles useful in cases when the user would use options that give us intermediate compiler outputs?
>
> In my experience, most of such use cases are intended for manual examination of compiler output and as such I'd prefer to keep the results immediately usable, without having to unbundle them. In such cases we're already changing command line options and adjusting them to produce the output from the specific sub-compilation I want is trivial. Having to unbundle things is more complicated as the bundler/unbundler tool as it is right now is poorly documented and is not particularly user-friendly. If it is to become a user-facing tool like ar/nm/objdump, it would need some improvements.
>
> If you do have use cases when you do need to bundle intermediate results, are they for the human consumption or for tooling? Perhaps we should make the "bundle the outputs" behavior an controllable by a flag, and keep enforcing "one output only" as the default.

We use ccache and need one output for -E with device compilation. Also there are use cases to emit bitcode for device compilation and link them later. These use cases require output to be bundled.

If users want to get the unbundled output, they can use -save-temps. Is it sufficient?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630



More information about the cfe-commits mailing list