[PATCH] D101630: [HIP] Fix device-only compilation

Fri Apr 30 09:35:46 PDT 2021

tra added a comment.

CUDA compilation currently errors out if `-o` is used when more than one output would be produced.
E.g.

  % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c -E 
  #... preprocessed output from host and 2 GPU compilations is printed out

  % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c -E  -o foo.out
  clang-13: error: cannot specify -o when generating multiple output files

  % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c --cuda-device-only -E  -o foo.out
  clang-13: error: cannot specify -o when generating multiple output files

  % bin/clang++ -x cuda --offload-arch=sm_60 --offload-arch=sm_70 --cuda-path=$HOME/local/cuda-10.2  zz.cu -c --cuda-device-only -S  -o foo.out
  clang-13: error: cannot specify -o when generating multiple output files

I think I've borrowed that behavior from some of the macos-related functionality, so we do have a somewhat established model of how to handle multiple outputs.
Wrapping multiple outputs into a single bundle could be an option too.

The question is -- what would make most sense.
Are bundles useful in cases when the user would use options that give us intermediate compiler outputs?

In my experience, most of such use cases are intended for manual examination of compiler output and as such I'd prefer to keep the results immediately usable, without having to unbundle them. In such cases we're already changing command line options and adjusting them to produce the output from the specific sub-compilation I want is trivial. Having to unbundle things is more complicated as the bundler/unbundler tool as it is right now is poorly documented and is not particularly user-friendly. If it is to become a user-facing tool like ar/nm/objdump, it would need some improvements.

If you do have use cases when you do need to bundle intermediate results, are they for the human consumption or for tooling? Perhaps we should make the "bundle the outputs" behavior an controllable by a flag, and keep enforcing "one output only" as the default.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630