[PATCH] D125979: [ADT] add ConcurrentHashTable class.

Tue May 31 10:02:27 PDT 2022

avl added inline comments.

================
Comment at: llvm/include/llvm/ADT/ConcurrentHashTable.h:117-118
+    else if (CurSlabNum > MaxNumberOfSlabs)
+      reportWarning("Too many slabs. Consider increasing initial size or "
+                    "changing hash function.");
+
----------------
dblaikie wrote:
> avl wrote:
> > dblaikie wrote:
> > > It's pretty weird to have warnings coming out of data structures. @MaskRay would it be reasonable to remove these/replace them with some other tool if we're promoting this to a more general purpose data structure rather than something more special-case in lld?
> > Probably, instead of warning we can introduce method:
> > 
> > ```
> > bool IsInGoodState () const {
> >   return CurSlabNum <= MaxNumberOfSlabs && !hasFullSlab;
> > }
> > 
> > 
> > ```
> > 
> > the consumer could call this method somewhere and inform the user:
> > 
> > ```
> > if (!HashTable.IsInGoodState()) {
> >     outs() << "Hash table is in suboptimal state";
> > }
> > ```
> > 
> > ?
> Maybe? I'd be curious to hear from @MaskRay here, since I guess he wrote the original/had the original motivation for the warning.
that is me, who introduced the warning. Original lld implementation does not have it.

My motivation is that it should be known somehow that hashtable is not in good state and it is necessary to revise initial size.

================
Comment at: llvm/include/llvm/ADT/ConcurrentHashTable.h:315-316
+///
+/// It is recommended to use ConcurrentHashTableExtAlloc if sizeof(KeyDataTy) >
+/// sizeof(pointer).
+///
----------------
dblaikie wrote:
> avl wrote:
> > dblaikie wrote:
> > > This should probably be provided via specialization, rather than as a separately named type?
> > > 
> > > I guess it has different requirements for the type, by the looks of it? That's curious/maybe enough to warrant a separate name, I guess, though I'm not 100% sure.
> > They have different interface:
> > 
> > 
> > ```
> > std::pair<KeyDataTy, bool> insert(KeyDataTy NewData)
> > ```
> > and
> > ```
> > std::pair<KeyDataTy *, bool> insert(AllocatorTy &Allocator, KeyTy Key)
> > ```
> > 
> > Implementing this through the specialization looks to be more complex and it seems does not have benefits comparing to the variant with separately named types?
> Fair enough. I wonder if it'd be acceptable to have the hash table take the allocator as a ctor parameter (& then keep the same API otherwise) and store it, or if there are too many of these that the extra pointer member would be too costly.
> 
> Though coming back to the original comment - why is external allocation usage/recommendation related to the size of the keys/size of a pointer?
>Fair enough. I wonder if it'd be acceptable to have the hash table take the allocator as a ctor parameter (& then keep the same API otherwise) and store it, or if there are too many of these that the extra pointer member would be too costly.

The reason is necessity to synchronize allocator.

The allocator may be passed as the ctor parameter if it would be able to work in multy-thread mode. This implementation assumes that allocator is not able to work in multi-thread mode. To properly working it requires that the same allocator is used from the same thread. In that case no need to synchronize allocator.

1. std::pair<KeyDataTy*, bool> insert(KeyTy Key)

The insert method might be called from different threads. if It would call to allocator stored in the constructor then we need to use mutex or to use special multi-thread implementation for the allocator.

2. std::pair<KeyDataTy *, bool> insert(AllocatorTy &Allocator, KeyTy Key)

This solution assumes that the same thread uses the same allocator. It allows not to synchronize allocators.

>Though coming back to the original comment - why is external allocation usage/recommendation related to the size of the keys/size of a pointer?

entry size for the solution with external allocation could not be less than pointer(since each entry is a pointer to the externally allocated data). 

entry size for the solution with allocating inside hash table can be less and then the table may have smaller size. It also does not need special allocator, since data are kept inside hash table.

We could not always keep data inside hash table: If data size is greater then pointer then each entry(even null) would occupy bigger size. That would result in a bigger hash table. It would also result in copying extra data, since data entries are passed by value in that case.

So if entry size is less than pointer then it would require less size and no additional allocator required when ConcurrentHashTable is used. If entry size is greater than pointer then less space is used(no need to allocate full data for null entries) and data copying is faster when ConcurrentHashTableExtAlloc is used.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125979/new/

https://reviews.llvm.org/D125979