[PATCH] D101630: [HIP] Fix device-only compilation
Yaxun Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Apr 30 13:39:00 PDT 2021
yaxunl added a comment.
In D101630#2729975 <https://reviews.llvm.org/D101630#2729975>, @tra wrote:
> What will happen with this patch in the following scenarios:
>
> - `--offload_arch=A -S -o out.s`
> - `--offload_arch=A --offload-arch=B -S -o out.s`
>
> I would expect the first case to produce a plain text assembly file. With this patch the second case will produce a bundle. With some build tools users only add to the various compiler options provided by the system. Depending on whether those system-provided options include an `--offload-arch`, the format of the output in the first example becomes unstable. So the consistent way would be to always bundle everything, but that breaks (or at least complicates) the normal single-output case and makes it deviate from what users expect from a regular C++ compilation.
>
> In D101630#2729768 <https://reviews.llvm.org/D101630#2729768>, @yaxunl wrote:
>
>> We use ccache and need one output for -E with device compilation. Also there are use cases to emit bitcode for device compilation and link them later. These use cases require output to be bundled.
>
> This is a good point. I don't think I've ever used ccache on a CUDA compilation, but I see how ccache may get surprised.
>
> Considering the scenario above, I think a better way to handle it would be to teach ccache about CUDA/HIP compilation. It's a similar situation with support for split DWARF, when compiler does something beyond the expected one-input to one-output transformation.
> E.g. we could tell it to use stdout for `-E`. Or implement the `bundle-everything` flag in clang and let ccache use it if it needs to have a single output.
>
>> If users want to get the unbundled output, they can use -save-temps. Is it sufficient?
>
> In terms of saving intermediate outputs - yes. In terms of usability - no. Sometimes I want one particular intermediate result saved with exact filename (or piped to stdout) and saving bunch and then picking one would be a pretty annoying usability regression for me.
How about an option -fhip-bundle-device-output. If it is on, device output is bundled no matter how many GPU arch there are. By default it is on.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101630/new/
https://reviews.llvm.org/D101630
More information about the cfe-commits
mailing list