[PATCH] D132248: [CUDA][OpenMP] Fix the new driver crashing on multiple device-only outputs

Joseph Huber via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Aug 19 11:50:32 PDT 2022


jhuber6 added a comment.

In D132248#3735943 <https://reviews.llvm.org/D132248#3735943>, @tra wrote:

> In D132248#3735900 <https://reviews.llvm.org/D132248#3735900>, @jhuber6 wrote:
>
>> Is this an architectural limitation? I'd imagine they'd just behave the same way here in my implementation.
>
> The constraint here is that we have to stick with a single output per compiler invocation and that format of that output should be the same. E.g. for C++ we'd expect to see an ELF file when we compile with `-c` and text assembly, if we compile with `-S`.
>
> We could pack GPU objects into a fat binary, but for consistency it would have to be done for single-target compilations, too. Packing a single object into a fat binary would make little sense, but producing an object file or a fat binary depending on the nubmer of targets would be inconsitent.
> Similarly, compilation with `-S` also gets tricky -- do you bundle the text output? That would be not particularly useful as, presumably, one would want to examine the assembly. We could concatenate together the ASM files, but that would produce an assembly source we can't really assemble.
> On top of that, CUDA compilation has been around for a while and changing the output format would be somewhat disruptive.
>
> In the end, CUDA stuck with insisting on erroring out when the `-o` has been specified, but where it would need to produce multiple outputs.
> HIP grew a `--[no-]gpu-bundle-output` option to control whether to bundle outputs of device-only compilation.

Thanks for the background. I'm assuming HIP did this because they use the old `clang-offload-bundler` which supported bundling multiple file types, while my new method relies on having some LLVM-IR to embed things in. I wasn't a huge fan of outputting bundles because it meant we couldn't do things like `clang -o - | opt` or similar. For my implementation I will probably make HIP do what CUDA does as I feel that is more reasonable unless someone has a major objection.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132248/new/

https://reviews.llvm.org/D132248



More information about the cfe-commits mailing list