[llvm-dev] Concurrent Hashmap?

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Thu Apr 8 12:44:09 PDT 2021


On Thu, Apr 8, 2021 at 9:13 AM Jez via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> Alexandre Ganea <alexandre.ganea at ubisoft.com> 9:05 AM (2 hours ago) to
> me, llvm-dev at lists.llvm.org
> > When you say you'd like to parallelize LLD, which driver do you mean?
>
> I meant the Mach-O driver; sorry for omitting context.
>
> > There's already a specialized lock-free hashtable for the Debug Types
> merging the COFF driver
>
> Oh, that's cool! Let me see if I can build upon it...
>

If you are attempting to parallelize the symbol resolution phase, I can try
to share some thoughts on how to generalize that table over to parallel
symbol resolution.

Right now, symbol resolution in COFF, and presumably the other linkers, is
interleaved with input loading, since referencing a symbol can pull in
stuff from an archive. The first thing that needs to be done is to split
that up into phases, where the linker merges all the symbols of all of the
inputs it knows about, and then successively loads more inputs and symbols.
Some of this work is already done, but at least for COFF, more
restructuring needs to be done.

Once the restructuring is done, the merging algorithm should be pretty
simple. As inputs, there are:
- Symbol hash table from previous round of symbol resolution
- New object files to load, each with a symbol table

The output should be:
- The hashtable, composed of 64-bit cells indicating which symbol
prevailed, identified by a pair of 32-bit numbers, object file index, and
symbol table index

I recommend splitting up symbol resolution into a fully data parallel phase
and a "shuffle" phase that does the concurrent hash table insertion. This
makes rehashing very simple: if the table gets too full, simply abort the
concurrent insertions, sync up the worker threads, and retry with a bigger
table. The map phase should read the object file and pre-calculate a good
hash for every external symbol. After all symbols have been hashed,
the shuffle phase should attempt to concurrently insert into the table, and
CAS into the table if the current symbol has a higher precedence than the
symbol in the table. The algorithm for deciding which symbol prevails is
format-specific and implements the rules for things like weak symbols and
link line order. It should be deterministic.

That sketch leaves a lot to be desired, but that's the project I've been
planning for COFF for a while now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210408/b7ede6ab/attachment.html>


More information about the llvm-dev mailing list