[llvm-dev] [RFC] lld: mostly-concurrent symbol resolution

Mon Apr 22 11:53:11 PDT 2019

Sounds pretty good to me. An interesting case that occurs to me is
situations where there are many libraries specified, but only few are
used. In the current scheme, the scalability of the resolution
algorithm isn't dependent on the total number of libraries, is it? (It
stops once all symbols are resolved - potentially not visiting many
unused libraries)

I wonder if there are cases where many unused libraries are specified
(perhaps in a situation where one project builds multiple executables
and some of those executables only use few external libraries, while
others use many - but all of the external libraries are written once
in a list of external dependencies the project as a whole depends on -
or I guess a situation where there are two executables in a project,
one uses external libraries A, the other users B, but both just get a
link line that specifies A and B) - which would result in this
strategy doing potentially significantly more work than the current
implementation.

But it'll be great to see the data no doubt - can gauge how much this
costs (how many parallel threads is the break-even point for this
separation of work, how many unused libraries would be needed to
overwhelm the benefits, etc).

- Dave

On Mon, Apr 22, 2019 at 1:53 AM Rui Ueyama via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> Hi all,
>
> This is a design change proposal to make lld's symbol resolution phase mostly-concurrent without changing the existing semantics. The aim of this change is to speed up the linker on multi-core machines.
>
> Background:
> Even though many parts of lld are multi-threaded, the symbol resolution phase is single-threaded. In the symbol resolution phase, the linker does the following:
>
>  - Read one symbol at a time from each input file and insert it to a hash table.
>  - If we find an undefined symbol that can be resolved by an object file in a static archive file, immediately pull that file out from the archive (which may transitively pull out other files from other archives).
>
> The output of this phase is a set of symbols merged by name and a set of object files to be included in an output file.
>
> We couldn't use threads in this phase because it was hard to guarantee the determinism of the link result. To see why, assume that more than one archive file define the same symbol (which is not an odd assumption, as it is a common practice to write your own libc functions and override the standard ones by inserting your file before `-lc` in the command line, for example). If two or more threads simultaneously read input files, it is hard to guarantee the link order. To simplify stuff, we chose to not use multi-threading at all in this phase.
>
> This phase takes roughly 1/3 of the total execution time for typical programs.
>
> Analysis:
>
> Doing something for every symbol is not necessarily slow even if the number of symbols is large. We have many loops that iterates over all symbol objects (lld's internal representation of symbols) in lld, and the loops don't take too much time. What is actually slow when reading input files is to insert symbol strings into an in-memory hash table. This is particularly worse when linking large programs. Large programs tend to be written in C++, and C++ symbol names are particularly long due to name mangling. Inserting hundreds of thousands of long strings into a hash table is a computationally intensive work.
>
> Proposal:
>
> I propose we split the symbol resolution phase into two phases and use multi-threading in the first phase. In the first phase, we visit all files, including ones in archive files, to insert all symbol strings to sharded hash tables. For each symbol string, we insert it to a symbol table with a placeholder symbol object as a value.
>
> Each file contains a member `std::vector<Symbol *> Symbols`. Once the first phase is done, file A's N'th symbol and file B's M'th symbol have the same name if and only if `A.Symbols[N] == B.Symbols[M]`. That means symbols are merged by name, but "symbol resolution" in the regular linker's sense is not done yet. This phase is highly parallelizable.
>
> In the second phase, we visit each input file serially and do name resolution just like we are currently doing. The only difference is, for each symbol, instead of looking up a symbol table with a symbol string, we just dereference a pointer to obtain a corresponding symbol object which should be extremely fast. Since the symbol resolution algorithm is still single-threaded and doesn't change, the output remains the same -- it is deterministic and if two files define the same symbol, the first file would be chosen.
>
> One caveat is with this scheme we effectively ignore archive file's symbol tables. We instead directly read symbols from archive members. This shouldn't change the semantics because an archive file's symbol table should be consistent with its member files. But this may be perceived as a weird behavior because lld would work "correctly" even if an archive file's symbol table is inconsistent, corrupted or not present.
>
> Implementation details:
>
> I don't expect too much change, but it looks like I need to move symbol resolution code (which calls replaceSymbol to replace symbols) from SymbolTable.cpp to somewhere else (perhaps to Symbols.cpp), because in the second phase, we can determine if two symbols have the same name without asking to the symbol table. Even in the current architecture, they don't really have to belong to the symbol table, so it seems refactoring it is generally a good idea. I'd do this first before implementing this proposal.
>
> Rui
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev