[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 21 14:42:45 PDT 2022


tra added a comment.

In D122069#3397560 <https://reviews.llvm.org/D122069#3397560>, @jhuber6 wrote:

> I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.
>
> 1. This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work.

I didn't realize that content of the file is part of your packagin scheme. I've interpreted `embed certain metadata along with a binary image ` as literally keeping the binaries as binaries and just adding a small blob with additional metadata. It was that data I meant to encode as JSON. Encoding the offload binaries themselves as JSON would indeed be wasteful.

We could keep the header as a binary (never changes on-disk format) and use JSON representation for the array of the entries (0-terminated string or string + length stored in the header) which also has fixed (as in version-agnostic) on-disk format, though of variable length.
Versioning still has to be dealt with, but now it would be independent of the on-disk format. The variable length of JSON is both a plus and a minus. On the positive side is that content is open. Some tool may add whatever is relevant for its own use, comments, provenance info, checksum, etc.
The downside is that it has variable length, so it would have to be written after the image binary. We would also need to deal with potential errors parsing JSON.

I don't have a strong preference either way. I think JSON may have few minor benefits, but the proposed binary format has the advantage of simplicity. We can always switch to json-encoded entries later by bumping the header version.



================
Comment at: clang/include/clang/Basic/Offloading.h:87-104
+  struct Header {
+    uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
+    uint32_t Version;
+    uint64_t Size;
+    uint64_t EntryOffset;
+  };
+
----------------
jhuber6 wrote:
> jhuber6 wrote:
> > tra wrote:
> > > tra wrote:
> > > > Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
> > > > 
> > > > 
> > > On-disk format could use more comments.
> > What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.
> Noted, will add more if we settle on this format.
`__attribute__((packed))`. Otherwise you depend on assumed natural alignment and that is target-dependent, IIRC.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069



More information about the cfe-commits mailing list