[PATCH] D101630: [HIP] Fix device-only compilation

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 1 14:24:14 PDT 2021


tra added a comment.

In D101630#2792052 <https://reviews.llvm.org/D101630#2792052>, @yaxunl wrote:

> I think for intermediate outputs e.g. preprocessor expansion, IR, and assembly, probably it makes sense not to bundle by default.

Agreed.

> However, for default action (emitting object), we need to bundle by default since it was the old behavior and existing HIP apps depend on that.

Existing use is a valid point.
As a counterargument, I would suggest that in a compilation pipeline which does include bundling, an object file for one GPU variant *is* an intermediate output, similar to the ones you've listed above.

The final product of device-side subcompilations is a bundle. The question is `what does "-c" mean?`.  Is it `produce an object file` or `compile till the end of the pipeline` ? 
For CUDA and HIP compilation it's ambiguous. When we target just one GPU, it would be closer to the former. In general, it would be closer to the latter. NVCC side-steps the issue by using a different flags `-cubin/-fatbin` to disambiguate between two cases and avoid bolting on CUDA-related semantics on the compiler flags that were not designed for that.

> Then we allow -fhip-bundle-device-output to override the default behavior.

OK. Bundling objects for HIP by default looks like a reasonable compromise. 
It would be useful to generalize the flag to `-fgpu-bundle...` as it would be useful if/when we want to produce a fatbin during CUDA compilation. I'd still keep no-bundling as the default for CUDA's objects.

Now that we are in agreement of what we want, the next question is *how* we want to do it.

It appears that there's a fair bit of similarity between what the proposed `-fgpu-bundle` flag does and the handful of `--emit-...` options clang has now.
If we were to use something like `--emit-gpu-object` and `--emit-gpu-bundle`, it would be similar to NVCC's `-cubin/-fatbinary`, would decouple the default behavior for `-c --cuda-device-only` from the user's ability to specify what they want without burdening `-c` with additional flags that would have different defaults under different circumstances.

Compilation with "-c" would remain the "compile till the end", whatever it happens to mean for particular language and `--emit-object/bundle` would tell the compiler how far we want it to proceed and what kind of output we want. This would probably be easier to explain to the users as they are already familiar with flags like `-emit-llvm`, only now we are dealing with an extra bundling step in the compilation pipeline. It would also behave consistently across CUDA and HIP even though they have different defaults for bundling for the device-side compilation. E.g. `-c --cuda-device-only --emit-gpu-bundle` will always produce a bundle with the object files for both CUDA and HIP and `-c --cuda-device-only --emit-gpu-object` will always require single '-o' output.

WDYT? Does it make sense?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630



More information about the cfe-commits mailing list