[LLVMdev] Using thin archives when building llvm

Wed Jul 22 17:32:07 PDT 2015

> Cool!
>
> So the thin archive is a divergence from the standard ar file (although it's
> compatible with GNU). Is there any room to push it further? Last time I ran
> the linker with profiling enabled, it spends a good amount of time just to
> find the terminating nul character in the archive file symbol table. If we
> store string length for each symbol, the linker can read archive files
> faster.

We can probably do it, yes.

Take a look at the BSD format (used on OS X, I just implemented it in llvm).

It is a bit better in that the symbol table is organized as a series
of offset pairs. One to the member, one to the string table.

This already improves handling on the traditional unix linker model
where we scan each member to see if it should be included on the link.
Once we find out it is to be included, it is really fast to scan past
the member without looking for nulls as one has to do in the GNU
format.

That doesn't help with COFF were we do a single pass anyway, but there
is more that we could benefit from the BSD format. I think that in
practice the string table in in order, so that we can compute the
string size by looking at the next member. I will give that a try.

Another reason to come up with a thin BSD format variant :-)

Cheers,
Rafael

P.S.: While testing the thin archive format I noticed that the thin
.lib files were a lot bigger than what I was getting on linux. It
turns out it was because cl.exe was producing .obj files with a *lot*
more symbols than clang on linux. Trying clang on windows showed that
cl.exe was not dropping what we call linkonce_odr, but clang on
windows still produces more symbols than clang on linux.