[llvm-dev] Concurrent Hashmap?

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Mon Apr 12 13:45:47 PDT 2021


On Fri, Apr 9, 2021 at 12:01 PM antlists via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Could you do a dynamic hash instead? In other words grow the hash
> instead of starting over - what's going to be the cost of retrying as
> opposed to growing?
>

At a really high level, I believe starting the insertion over has the same
algorithmic complexity as rehashing with the partially filled table, it's
all O(n). Eventually you build a table of size n, and along the way you
have to build a table of size n/2, n/4, n/8..., and if you sum up all the
work to build those tables, you get 2n, or O(n).

At a lower level, for common inputs, rehashing could be faster by a
constant factor. Suppose there are n input symbols and m unique output
symbols, and m is significantly smaller than n. Rehashing only takes O(m)
time, but reinserting takes O(n) time. The overall work is still O(n).
However, rehashing requires a much fancier concurrent data structure,
probably with higher constant factor overhead. You may gain as much as you
lose. And you open yourself up to concurrency bugs. Even if you take a
tested concurrent data structure off the shelf, they can be tricky to use
correctly.

So, at least for PDB global type hashing, I felt it was better to implement
a less general data structure that only does bulk inserts followed by
lookups. I also knew how many insertions I was going to do, so I could also
conservatively overallocate a big hash table and sidestep rehashing. We
can't quite do the same for symbols because symbol resolution can discover
new inputs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210412/836c73cc/attachment.html>


More information about the llvm-dev mailing list