[LLVMdev] LLVM Archive Format Extension Proposal

Wed Nov 21 12:09:05 PST 2012

On Wed, Nov 21, 2012 at 8:55 AM, Relph, Richard <Richard.Relph at amd.com> wrote:
> AMD would like to add new functionality to ranlib (and later ar and nm) and
> to the bits of LLVM Core that read (and later write) archives.
> Herewith a terse summary of the change, which we want to improve support of
> OpenCL for multiple GPUs in a single run-time.
>
> Conceptually, a serialized archive is really 2 pieces: a few header members
> and a set of normal file members. There are no constraints on the normal
> members in the 'pure' archive format. They could be text files, pictures,
> or, as we're all familiar with, object modules. Most object file archives
> are "libraries" and the have a special header member that is a global symbol
> table, associating global scope names with defining object module members in
> the archive body.
>
> We have N very large archives, defining essentially the same set of symbols.
> Many of the normal file members of each are duplicated in other archives,
> but not all. The goal is the produce a single "super-archive" that contains
> 1 copy of each unique object file member no matter how many archives it is
> part of, and N symbol table members representing each of the original N
> archives.
>
> The symbol table for each original archive can properly index to the
> relevant members in the archive, even if other members in the super-archive
> (not referenced in this particular symbol table, of course) define the same
> symbols.
>
> I've considered 3 approaches to the problem so far. All involve a new
> archive member type.
>
> First, a new archive member type "up front" that describes each of the
> original archives and its symbol table.
> Second, a normal/default symbol table member "up front" and a new archive
> member type that describes alternate symbol tables contained in the archive.
> Third, a "hiding" archive member type that is essentially a way to "skip
> over" additional normal archive file headers to reach the first normal
> member, which (in all approaches) all archives share.
>
> The third, I think, requires the least changes to the existing
> implementation, so I'm leaning towards it. The "hiding" archive member would
> have the "file name" of the represented archive immediately following the
> member header, followed by a completely normal archive representation
> starting with "<!arch>\n" and optionally including an additional "hiding"
> archive member covering even more hidden archives.
>
> The plan is to extend the Archive class to provide for a way for clients to
> select a desired archive. I also will enhance ranlib to accept multiple
> archive names on the command line and produce the "super-archive" from
> ranlib.
>
> A further need we have is to serialize the TOCs and the super-archive in a
> memory image (our archives are embedded in our DLL/SO, not stored separately
> on disk) and then provide an interface to the relevant LLVM classes (Linker,
> primarily) for accessing archives in memory rather than on disk, a feature
> absent from the current implementation.
>
> For our purposes, extending the Archive class to support specification of
> the archive using a memory object instead of a file, recognizing the
> "hiding" member type, and extending ranlib to produce the new super archives
> is all we really need.
>
> Any thoughts or suggestions would be welcome.
>
> Thanks,
> Richard

Note that I plan to remove llvm/Bitcode/Archive once Object/Archive is
capable of replacing it. The llvm tools that don't write archives
files have already been switched over to it. Object/Archive already
supports MemoryBuffer as a source for the data.

I support Nick's solution over extending the archive format
internally. I would also support adding a true wrapper. What I don't
like is not knowing that it's a special archive by just looking at the
file header.

- Michael Spencer