[PATCH] D101630: [HIP] Fix device-only compilation

Fri Jun 4 06:36:56 PDT 2021

yaxunl added a comment.

In D101630#2792160 <https://reviews.llvm.org/D101630#2792160>, @tra wrote:

> In D101630#2792052 <https://reviews.llvm.org/D101630#2792052>, @yaxunl wrote:
>
>> I think for intermediate outputs e.g. preprocessor expansion, IR, and assembly, probably it makes sense not to bundle by default.
>
> Agreed.
>
>> However, for default action (emitting object), we need to bundle by default since it was the old behavior and existing HIP apps depend on that.
>
> Existing use is a valid point.
> As a counterargument, I would suggest that in a compilation pipeline which does include bundling, an object file for one GPU variant *is* an intermediate output, similar to the ones you've listed above.
>
> The final product of device-side subcompilations is a bundle. The question is `what does "-c" mean?`.  Is it `produce an object file` or `compile till the end of the pipeline` ? 
> For CUDA and HIP compilation it's ambiguous. When we target just one GPU, it would be closer to the former. In general, it would be closer to the latter. NVCC side-steps the issue by using a different flags `-cubin/-fatbin` to disambiguate between two cases and avoid bolting on CUDA-related semantics on the compiler flags that were not designed for that.
>
>> Then we allow -fhip-bundle-device-output to override the default behavior.
>
> OK. Bundling objects for HIP by default looks like a reasonable compromise. 
> It would be useful to generalize the flag to `-fgpu-bundle...` as it would be useful if/when we want to produce a fatbin during CUDA compilation. I'd still keep no-bundling as the default for CUDA's objects.
>
> Now that we are in agreement of what we want, the next question is *how* we want to do it.
>
> It appears that there's a fair bit of similarity between what the proposed `-fgpu-bundle` flag does and the handful of `--emit-...` options clang has now.
> If we were to use something like `--emit-gpu-object` and `--emit-gpu-bundle`, it would be similar to NVCC's `-cubin/-fatbinary`, would decouple the default behavior for `-c --cuda-device-only` from the user's ability to specify what they want without burdening `-c` with additional flags that would have different defaults under different circumstances.
>
> Compilation with "-c" would remain the "compile till the end", whatever it happens to mean for particular language and `--emit-object/bundle` would tell the compiler how far we want it to proceed and what kind of output we want. This would probably be easier to explain to the users as they are already familiar with flags like `-emit-llvm`, only now we are dealing with an extra bundling step in the compilation pipeline. It would also behave consistently across CUDA and HIP even though they have different defaults for bundling for the device-side compilation. E.g. `-c --cuda-device-only --emit-gpu-bundle` will always produce a bundle with the object files for both CUDA and HIP and `-c --cuda-device-only --emit-gpu-object` will always require single '-o' output.
>
> WDYT? Does it make sense?

For sure we will need -fgpu-bundle-device-output to control bundling of intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be redundant and can cause confusion. What if users specify `-c -fgpu-bundle-device-output -emit-gpu-object` or `-c -fno-gpu-bundle-device-output -emit-gpu-bundle`? To me a single option -fgpu-bundle-device-output to control all device output seems cleaner.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630