[PATCH] D101630: [HIP] Fix device-only compilation

Mon May 24 10:41:51 PDT 2021

tra added a subscriber: echristo.
tra added a comment.

In D101630#2777346 <https://reviews.llvm.org/D101630#2777346>, @yaxunl wrote:

> In D101630#2748513 <https://reviews.llvm.org/D101630#2748513>, @tra wrote:
>
>> How about this:
>> If the user explicitly specified `--cuda-host-only` or `--cuda-device-only`, then by default only allow producing the natural output format, unless a bundled output is requested by an option. This should keep existing users working.
>> If the compilation is done without explicitly requested sub-compilation(s), then bundle the output by default. This should keep the GPU-unaware tools like ccache happy as they would always get the single output they expect.
>>
>> WDYT?
>
> `--cuda-host-only` always have one output, therefore there is no point of bundle its output. We only need to decide the proper behavior of `--cuda-device-only`.

It still fits my proposal of requiring a single sub-compilation and not bundling the output.
The point was that such behavior is consistent regardless of whether we're compiling CUDA or HIP for the host or for device.

> How about keeping the original default behavior of not bundling if users do not specify output file, 
> whereas bundle the output if users specifying output file.

I think it will make things worse. Compiler output should not change depending on whether `-o` is used.

> Since specifying output file indicates users  requesting one output. -f[no-]hip-bundle-device-output override the default behavior.

I disagree. When user specifies the output, the intent is to specify the **location** of the outputs, not its contents or format.

Telling compiler to produce a different output format should not depend on specifying (or not) the output location.

I think our options are:

- Always bundle --cuda-device-only outputs by default. This is consistent for HIP compilation, but deviates from CUDA, which can't do bundling. Also, single-target subcompilation deviates from both CUDA and regular C++ compilation, which is what most users would be familiar with and which would probably be the most sensible default for a single sub-compilation. It can be overridden with an option, but it goes against the principle that it's specialized use case that should need extra options. The most common use case should not need them.

- Only bundle multiple sub-compilations' output by default. This would preserve the sensible single sub-compilation behavior. The downside is that it makes the output format depend on whether compiler ends up doing one or many sub-compilations. E.g. `--offload-arch=A -S` would produce ASM and `--offload-arch=A --offload-arch=B -S` would produce a bundle. If the user can't control some of the compiler options, Such approach would make output format unpredictable. E.g. passing `--offload-arch=foo` to compiler on godbolt would all of a sudden produce bundled output instead of assembly text or a sensible error message that you're trying to produce multiple outputs.

- Keep the current behavior (insist on single sub-compilation) as the default, allow overriding it for HIP with the flag. IMO that's the most consistent option and I still think it's the one most suitable to keep as the default.

I can see the benefit of always bundling for HIP, but I also believe that keeping things simple, consistent and predictable is important. Considering that we're tinkering in a relatively obscure niche of the compiler, it probably does not matter all that much, but it should not stop us from trying to figure out the best approach in a principled way.

I think we could benefit from a second opinion on which approach would make more sense for clang. 
Summoning @jdoerfert and @echristo.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630