[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata
Joseph Huber via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 21 13:21:05 PDT 2022
jhuber6 added a comment.
In D122069#3397373 <https://reviews.llvm.org/D122069#3397373>, @tra wrote:
>> I decided to use a custom format rather than using an existing format (ELF, JSON, etc) because of the specialty use-case of this.
>
> Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from different LLVM trees.
> If so, then maintaining compatibility of the binary format will become more painful over time, compared to using json.
As it stands now, this is only used by the linker-wrapper with is all clang. LLD is only called as far of the host so it's impossible to mix versions. That being said, eventually I want this functionality to be performed by the linker itself without needing the extra wrapper so it'll be possible in the future. I added the "Version" field specifically to emit a warning or error in the future if we change this. Even if we used JSON or some other form it would be a similar story if these get de-synced because we'd have different entries and keys. We could add to the struct without breaking the ABI since everything uses absolute offsets.
> Considering that we already have json parsing and serialization in LLVM, I don't see much benefit growing yet another on-disk format. AFAICT, JSON should do the job encoding relevant data just fine and would need less maintenance long term.
I was definitely thinking about a lot of alternatives to making yet another on-disk binary format. I had a few discussions with @JonChesterfield on the subject. There were two things that put me off of using JSON primarily, correct me if I have any misconceptions.
1. This is fundamentally a very thin metadata wrapper around a binary image. AFAIK if you want to stream binary data to a JSON you need to use a base64 or similar encoding, which makes the files bigger and adds some extra work. It's not a deal-breaker, but it's somewhat of a turn-off.
2. There could be multiple of these contained in a single section, I primarily wanted something with a known size and easy to identify magic bytes to find where these live in the section. I wasn't sure how nicely this would interact with the JSON parser. It could also be done, but I wasn't sure how much utility there was.
================
Comment at: clang/include/clang/Basic/Offloading.h:87
+private:
+ struct Header {
+ uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
----------------
jhuber6 wrote:
> tra wrote:
> > tra wrote:
> > > Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
> > >
> > >
> > On-disk format could use more comments.
> What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.
Noted, will add more if we settle on this format.
================
Comment at: clang/include/clang/Basic/Offloading.h:87-104
+ struct Header {
+ uint8_t Magic[4] = {0x10, 0xFF, 0x10, 0xAD};
+ uint32_t Version;
+ uint64_t Size;
+ uint64_t EntryOffset;
+ };
+
----------------
tra wrote:
> tra wrote:
> > Given that it's an on-disk format I think these should be explicitly packed and carry a comment that it's an on-disk format which needs extra caution during future changes.
> >
> >
> On-disk format could use more comments.
What do you mean by explicitly packed? And I added the "Version" field in the header so we can warn on old versions.
================
Comment at: clang/include/clang/Basic/Offloading.h:90
+ uint32_t Version;
+ uint64_t Size;
+ uint64_t EntryOffset;
----------------
tra wrote:
> What does `Size` cover and what units it uses -- bytes, number of Entries, something else?
Size of the entire file, so you can do `Data[Header->Size]` and potentially get the next one in memory.
================
Comment at: clang/include/clang/Basic/Offloading.h:95-96
+ struct Entry {
+ uint16_t ImageKind;
+ uint16_t OffloadKind;
+ uint32_t Flags;
----------------
tra wrote:
> Should we use the matching enum types here?
> We're using 16-bit enums, so it would save us on casts when we use those fields and would make it harder to set a wrong value by mistake.
Yeah, the casts are annoying but I elected to be explicit with the size constraints in the header itself. Wasn't really sure which was better.
================
Comment at: clang/include/clang/Basic/Offloading.h:97
+ uint16_t OffloadKind;
+ uint32_t Flags;
+ uint64_t TripleOffset;
----------------
tra wrote:
> Do these flags have defined meaning?
They're unused for now, I was planning on making it a bitfield so we could pass things like whether or not this file contains debug information or is optimized, kind of like fatbinary.
================
Comment at: clang/include/clang/Basic/Offloading.h:98
+ uint32_t Flags;
+ uint64_t TripleOffset;
+ uint64_t ArchOffset;
----------------
tra wrote:
> What are the offsets relative to?
Absolute from the start of the file, can comment.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D122069/new/
https://reviews.llvm.org/D122069
More information about the cfe-commits
mailing list