[LLVMdev] LLVM Archive Format Extension Proposal

Wed Nov 21 08:55:30 PST 2012

AMD would like to add new functionality to ranlib (and later ar and nm) and to the bits of LLVM Core that read (and later write) archives.
Herewith a terse summary of the change, which we want to improve support of OpenCL for multiple GPUs in a single run-time.

Conceptually, a serialized archive is really 2 pieces: a few header members and a set of normal file members. There are no constraints on the normal members in the 'pure' archive format. They could be text files, pictures, or, as we're all familiar with, object modules. Most object file archives are "libraries" and the have a special header member that is a global symbol table, associating global scope names with defining object module members in the archive body.

We have N very large archives, defining essentially the same set of symbols. Many of the normal file members of each are duplicated in other archives, but not all. The goal is the produce a single "super-archive" that contains 1 copy of each unique object file member no matter how many archives it is part of, and N symbol table members representing each of the original N archives.

The symbol table for each original archive can properly index to the relevant members in the archive, even if other members in the super-archive (not referenced in this particular symbol table, of course) define the same symbols.

I've considered 3 approaches to the problem so far. All involve a new archive member type.

First, a new archive member type "up front" that describes each of the original archives and its symbol table.
Second, a normal/default symbol table member "up front" and a new archive member type that describes alternate symbol tables contained in the archive.
Third, a "hiding" archive member type that is essentially a way to "skip over" additional normal archive file headers to reach the first normal member, which (in all approaches) all archives share.

The third, I think, requires the least changes to the existing implementation, so I'm leaning towards it. The "hiding" archive member would have the "file name" of the represented archive immediately following the member header, followed by a completely normal archive representation starting with "<!arch>\n" and optionally including an additional "hiding" archive member covering even more hidden archives.

The plan is to extend the Archive class to provide for a way for clients to select a desired archive. I also will enhance ranlib to accept multiple archive names on the command line and produce the "super-archive" from ranlib.

A further need we have is to serialize the TOCs and the super-archive in a memory image (our archives are embedded in our DLL/SO, not stored separately on disk) and then provide an interface to the relevant LLVM classes (Linker, primarily) for accessing archives in memory rather than on disk, a feature absent from the current implementation.

For our purposes, extending the Archive class to support specification of the archive using a memory object instead of a file, recognizing the "hiding" member type, and extending ranlib to produce the new super archives is all we really need.

Any thoughts or suggestions would be welcome.

Thanks,
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121121/63de0377/attachment.html>