[llvm-dev] [RFC] lld: mostly-concurrent symbol resolution

Mon Apr 22 01:53:13 PDT 2019

Hi all,

This is a design change proposal to make lld's symbol resolution phase
mostly-concurrent without changing the existing semantics. The aim of this
change is to speed up the linker on multi-core machines.

*Background:*
Even though many parts of lld are multi-threaded, the symbol resolution
phase is single-threaded. In the symbol resolution phase, the linker does
the following:

 - Read one symbol at a time from each input file and insert it to a hash
table.
 - If we find an undefined symbol that can be resolved by an object file in
a static archive file, immediately pull that file out from the archive
(which may transitively pull out other files from other archives).

The output of this phase is a set of symbols merged by name and a set of
object files to be included in an output file.

We couldn't use threads in this phase because it was hard to guarantee the
determinism of the link result. To see why, assume that more than one
archive file define the same symbol (which is not an odd assumption, as it
is a common practice to write your own libc functions and override the
standard ones by inserting your file before `-lc` in the command line, for
example). If two or more threads simultaneously read input files, it is
hard to guarantee the link order. To simplify stuff, we chose to not use
multi-threading at all in this phase.

This phase takes roughly 1/3 of the total execution time for typical
programs.

*Analysis:*

Doing something for every symbol is not necessarily slow even if the number
of symbols is large. We have many loops that iterates over all symbol
objects (lld's internal representation of symbols) in lld, and the loops
don't take too much time. What is actually slow when reading input files is
to insert symbol strings into an in-memory hash table. This is particularly
worse when linking large programs. Large programs tend to be written in
C++, and C++ symbol names are particularly long due to name mangling.
Inserting hundreds of thousands of long strings into a hash table is a
computationally intensive work.

*Proposal:*

I propose we split the symbol resolution phase into two phases and use
multi-threading in the first phase. In the first phase, we visit *all*
files, including ones in archive files, to insert all symbol strings to
sharded hash tables. For each symbol string, we insert it to a symbol table
with a placeholder symbol object as a value.

Each file contains a member `std::vector<Symbol *> Symbols`. Once the first
phase is done, file A's N'th symbol and file B's M'th symbol have the same
name if and only if `A.Symbols[N] == B.Symbols[M]`. That means symbols are
merged by name, but "symbol resolution" in the regular linker's sense is
not done yet. This phase is highly parallelizable.

In the second phase, we visit each input file serially and do name
resolution just like we are currently doing. The only difference is, for
each symbol, instead of looking up a symbol table with a symbol string, we
just dereference a pointer to obtain a corresponding symbol object which
should be extremely fast. Since the symbol resolution algorithm is still
single-threaded and doesn't change, the output remains the same -- it is
deterministic and if two files define the same symbol, the first file would
be chosen.

One caveat is with this scheme we effectively ignore archive file's symbol
tables. We instead directly read symbols from archive members. This
shouldn't change the semantics because an archive file's symbol table
should be consistent with its member files. But this may be perceived as a
weird behavior because lld would work "correctly" even if an archive file's
symbol table is inconsistent, corrupted or not present.

*Implementation details:*

I don't expect too much change, but it looks like I need to move symbol
resolution code (which calls replaceSymbol to replace symbols) from
SymbolTable.cpp to somewhere else (perhaps to Symbols.cpp), because in the
second phase, we can determine if two symbols have the same name without
asking to the symbol table. Even in the current architecture, they don't
really have to belong to the symbol table, so it seems refactoring it is
generally a good idea. I'd do this first before implementing this proposal.

Rui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190422/11d2d3cc/attachment.html>