[llvm-dev] Concurrent Hashmap?

Fri Apr 9 12:19:00 PDT 2021

On 09/04/2021 19:18, Neil Nelson via llvm-dev wrote:
> https://github.com/facebook/folly/blob/master/folly/AtomicHashMap.h
> 
>   * AtomicHashMap --
>   *
https://en.wikipedia.org/wiki/Linear_hashing

Dynamic / Linear Hashing

>   * A high-performance concurrent hash map with int32_t or int64_t keys. 

Fixed size keys - pick 32-bit 64-bit whatever.

> Supports
>   * insert, find(key), findAt(index), erase(key), size, and more.  

ditto

> Memory cannot
>   * be freed or reclaimed by erase.  Can grow to a maximum of about 18 
> times the
>   * initial capacity, but performance degrades linearly with growth. 

Bucket based, so you could allocate and free buckets as required. 
Performance is pretty much O(1) if I've got that right :-)

>Can 
> also be
>   * used as an object store with unique 32-bit references directly into the
>   * internal storage (retrieved with iterator::getIndex()).
>   *
>   * Advantages:
>   *    - High-performance (~2-4x tbb::concurrent_hash_map in heavily
>   *      multi-threaded environments).
>   *    - Efficient memory usage if initial capacity is not over estimated
>   *      (especially for small keys and values).
>   *    - Good fragmentation properties (only allocates in large slabs 
> which can
>   *      be reused with clear() and never move).
>   *    - Can generate unique, long-lived 32-bit references for efficient 
> lookup
>   *      (see findAt()).
>   *
>   * Disadvantages:
>   *    - Keys must be native int32_t or int64_t, or explicitly converted.
>   *    - Must be able to specify unique empty, locked, and erased keys
>   *    - Performance degrades linearly as size grows beyond initialization
>   *      capacity.
>   *    - Max size limit of ~18x initial size (dependent on max load factor).
>   *    - Memory is not freed or reclaimed by erase.
> 
>   * Copyright (c) Facebook, Inc. and its affiliates.
>   *
>   * Licensed under the Apache License, Version 2.0 (the "License");
>   * you may not use this file except in compliance with the License.
>   * You may obtain a copy of the License at
> 
> Neil Nelson
> 
Read and find are very fast - basically scan one bucket.

Write, update and delete obviously require locking a bucket, but as the 
table grows the chances of a collision drop.

And as the table fills, the cost of growing the table remains constant, 
namely splitting just one bucket.

Downsides, you need to avoid a pathological hash algorithm.

Cheers,
Wol