[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

Joseph Huber via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 21 13:21:05 PDT 2022


jhuber6 added a comment.

In D122069#3397373 <https://reviews.llvm.org/D122069#3397373>, @tra wrote:

>> I decided to use a custom format rather than using an existing format (ELF, JSON, etc) because of the specialty use-case of this.
>
> Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from different LLVM trees.
> If so, then maintaining compatibility of the binary format will become more painful over time, compared to using json.

As it stands now, this is only used by the linker-wrapper with is all clang. LLD is only called as far of the host so it's impossible to mix versions. That being said, eventually I want this functionality to be performed by the linker itself without needing the extra wrapper so it'll be possible in the future. I added the "Version" field specifically to emit a warning or error in the future if we change this. Even if we used JSON or some other form it would be a similar story if these get de-synced because we'd have different entries and keys. We could add to the struct without breaking the ABI since everything uses absolute offsets.

> Considering that we already have json parsing and serialization in LLVM, I don't see much benefit growing yet another on-disk format. AFAICT, JSON should do the job encoding relevant data just fine and would need less maintenance long term.

I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.

1. This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work. It's not a deal-breaker, but it's somewhat of a turn-off.
2. There could be multiple of these contained in a single section, I primarily wanted something with a known size and easy to identify magic bytes to find where these live in the section. I wasn't sure how nicely this would interact with the JSON parser. It could also be done, but I wasn't sure how much utility there was.



================
Comment at: clang/include/clang/Basic/Offloading.h:87
+private:
+  struct Header {
+    uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
----------------
jhuber6 wrote:
> tra wrote:
> > tra wrote:
> > > Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
> > > 
> > > 
> > On-disk format could use more comments.
> What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.
Noted, will add more if we settle on this format.


================
Comment at: clang/include/clang/Basic/Offloading.h:87-104
+  struct Header {
+    uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
+    uint32_t Version;
+    uint64_t Size;
+    uint64_t EntryOffset;
+  };
+
----------------
tra wrote:
> tra wrote:
> > Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
> > 
> > 
> On-disk format could use more comments.
What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.


================
Comment at: clang/include/clang/Basic/Offloading.h:90
+    uint32_t Version;
+    uint64_t Size;
+    uint64_t EntryOffset;
----------------
tra wrote:
> What does `Size` cover and what units it uses -- bytes, number of Entries, something else?
Size of the entire file, so you can do `Data[Header->Size]` and potentially get the next one in memory.


================
Comment at: clang/include/clang/Basic/Offloading.h:95-96
+  struct Entry {
+    uint16_t ImageKind;
+    uint16_t OffloadKind;
+    uint32_t Flags;
----------------
tra wrote:
> Should we use the matching enum types here? 
> We're using 16-bit enums, so it would save us on casts when we use those fields and would make it harder to set a wrong value by mistake.
Yeah, the casts are annoying but I elected to be explicit with the size constraints in the header itself. Wasn't really sure which was better.


================
Comment at: clang/include/clang/Basic/Offloading.h:97
+    uint16_t OffloadKind;
+    uint32_t Flags;
+    uint64_t TripleOffset;
----------------
tra wrote:
> Do these flags have defined meaning?
They're unused for now, I was planning on making it a bitfield so we could pass things like whether or not this file contains debug information or is optimized, kind of like fatbinary.


================
Comment at: clang/include/clang/Basic/Offloading.h:98
+    uint32_t Flags;
+    uint64_t TripleOffset;
+    uint64_t ArchOffset;
----------------
tra wrote:
> What are the offsets relative to?
Absolute from the start of the file, can comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122069/new/

https://reviews.llvm.org/D122069



More information about the llvm-commits mailing list