[PATCH] D125979: [ADT] add LockFreeDataPool class.

Mon May 23 11:31:49 PDT 2022

avl added a comment.

> The name is not descriptive. It misses the important words about "hash map". ConcurrentHashMap may be a better name.

The data structure which I need is a concurrent data pool of unique objects.
That pool should keep objects and matching objects should have the same entry.

The hash map is an implementation detail. The pool uses hash map to control the data entries. There might be used another(other than hash map) structure controlling pool entries.

Probably, it is possible to separate this LockFreeDataPool into two classes data pool and hash map.

> The slab system appears to be the modification to lld/COFF/DebugTypes.cpp.
> It provides limited growing capability when the capacity is hit.
> There is a hard limit of 6 slabs, and report_fatal_error may eventually be called. Then why is reportWarning needed?
> I think we should either implement a proper growing mechanism or just crash.

The best use case for that hash map is when it is allocated once and no growings are necessary. But, there are use cases when it is hard to estimate initial size. In such cases, without resizing,  application will crash or need to waste memory(to allocate a huge amount of memory covering exceptional cases).

Slab system in this patch allows to resize hash map by the price of slower search(that is the source of restriction by the number of resizes). This allows to allocate smaller hash map initially and reallocate it for the exceptionally large cases. The purpose of the warning is to inform user that an exceptional case is hit.

> Does your application have a reasonable way estimating the capacity?

I use an coefficient for incoming DWARF to calculate the size of hash map. In most cases it is quite accurate. But there are some exceptional cases when estimation is not accurate and then possibility to do resize helps.

> Does it help to use HyperLogLog?

Let me examine this, please.

> (There is a poor way to not implement a growing version: If we want to prevent a crash in a pathological case where the estimated capacity is insufficient and causes the concurrent hash map to crash, we can use something like lld::safeLldMain (llvm::CrashRecoveryContext).)

I will check it, thanks. Do you think it would be better than possibility to resize?

================
Comment at: llvm/include/llvm/ADT/LockFreeDataPool.h:58
+    TableSize = NextPowerOf2(InitialSize);
+    TableSize = std::max(TableSize, (size_t)4096);
+    HashMask = TableSize - 1;
----------------
MaskRay wrote:
> Why so large?
to avoid re-sizings for small sizes. It does not look to be too big value, but will do working with small sizes more smooth.

================
Comment at: llvm/include/llvm/ADT/LockFreeDataPool.h:113
+  Slab *getNextSlab(Slab *CurSlab, AllocatorTy &Allocator) {
+    if (CurSlab->NextSlab == nullptr) {
+      const std::lock_guard<std::mutex> lock(SlabsMutex);
----------------
MaskRay wrote:
> relaxed load on NextSlab and save it into a local variable.
something like this ?:

```
Slab *Slab = CurSlab->NextSlab.load(std::memory_order_relaxed);

if (Slab == nullptr) {
      const std::lock_guard<std::mutex> lock(SlabsMutex);

      if (Slab  == nullptr)   <<<<<<<<<<<
        CurSlab->NextSlab = allocateNextSlab(Allocator);
    }
}

```
It looks like we could not save NextSlab into the local variable because CurSlab->NextSlab can change the value when we arrive at pointed line.. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125979/new/

https://reviews.llvm.org/D125979