[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

Joseph Huber via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 21 15:39:52 PDT 2022


jhuber6 added a comment.

In D122069#3397867 <https://reviews.llvm.org/D122069#3397867>, @tra wrote:

> In D122069#3397560 <https://reviews.llvm.org/D122069#3397560>, @jhuber6 wrote:
>
>> I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.
>>
>> 1. This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work.
>
> I didn't realize that content of the file is part of your packagin scheme. I've interpreted `embed certain metadata along with a binary image ` as literally keeping the binaries as binaries and just adding a small blob with additional metadata. It was that data I meant to encode as JSON. Encoding the offload binaries themselves as JSON would indeed be wasteful.
>
> We could keep the header as a binary (never changes on-disk format) and use JSON representation for the array of the entries (0-terminated string or string + length stored in the header) which also has fixed (as in version-agnostic) on-disk format, though of variable length.
> Versioning still has to be dealt with, but now it would be independent of the on-disk format. The variable length of JSON is both a plus and a minus. On the positive side is that content is open. Some tool may add whatever is relevant for its own use, comments, provenance info, checksum, etc.
> The downside is that it has variable length, so it would have to be written after the image binary. We would also need to deal with potential errors parsing JSON.
>
> I don't have a strong preference either way. I think JSON may have few minor benefits, but the proposed binary format has the advantage of simplicity. We can always switch to json-encoded entries later by bumping the header version.

I like the idea of keeping the header, we could add an additional field to the header for the size of the entry and I feel like we'd be pretty future-proof if we want to change stuff. I think using this binary format for now is sufficient as long as we keep upgrading it to something more complex an open possibility.

I'm also not sure if I should extend this as a binary format inheriting from LLVM's Binary class. It would be a minimal amount of work but I'm not sure if this use-case warrants extending this to broader LLVM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069



More information about the cfe-commits mailing list