[PATCH] D125165: [Clang] Introduce clang-offload-packager tool to bundle device files

Tue May 10 08:33:28 PDT 2022

jhuber6 added inline comments.

================
Comment at: clang/docs/ClangOffloadBinary.rst:15
+metadata. We use a custom binary format for bundling all the device images
+together. The image format is a small header wrapping around a string map. This
+tool creates bundled binaries so that they can be embedded into the host to
----------------
yaxunl wrote:
> It would help if more details are given, e.g, offset and size of members of the header and layout of the string map.
I can probably add some more documentation on that, would definitely help people inspecting these. Later I intend to let `llvm-objdump` extract these as well.

================
Comment at: clang/test/Frontend/embed-object.c:3

-// CHECK: @[[OBJECT:.+]] = private constant [120 x i8] c"\10\FF\10\AD{{.*}}", section ".llvm.offloading", align 8
+// CHECK: @[[OBJECT:.+]] = private constant [0 x i8] zeroinitializer, section ".llvm.offloading", align 8
 // CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr @[[OBJECT]]], section "llvm.metadata"
----------------
yaxunl wrote:
> Is this due to the embedded object being empty?
> 
> So now the bitcode for different targets are bundled by clang-offload-packager then embedded as one file in the relocatable object file?
> 
> In the old scheme the bitcode for different targets are bundled by clang-offload-bundler then embedded in the relocatable object file, right?
> 
> What's the advantage of clang-offload-packager compared with clang-offload-bundler?
> 
> Is this due to the embedded object being empty?
> 
Yes, we used to do the binary format in Clang itself so we got the binary stuff along with the empty file. Now this flag simply embeds a file at a section, the file is empty so we get a zeroinitializer. What's important in this test is just that the option puts the contents in the IR.

> So now the bitcode for different targets are bundled by clang-offload-packager then embedded as one file in the relocatable object file?
> 
Yes, this is basically like what fatbinary does for CUDA. We take all the files and put it into a single binary. The binary then contains metadata which lets us find these files later at link time.
> In the old scheme the bitcode for different targets are bundled by clang-offload-bundler then embedded in the relocatable object file, right?
> 
> What's the advantage of clang-offload-packager compared with clang-offload-bundler?
> 
The old clang offload bundler did some similar stuff, namely embedding multiple files into the host. It was similarly an ELF section if the target is an object file. Conceptually this only creates the actual binary that's being embedded and puts it in one big blob, this then just gets embedded directly in the IR. The benefit to this approach in my mind is that the host and device phases are more distinct, we don't need to call the `clang-offload-bundler` on the host files as well. I could've worked around the current clang offload bundler to make it do something similar, but I didn't see the utility when I'm doing different stuff using a different binary format.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125165/new/

https://reviews.llvm.org/D125165