[PATCH] D22681: [esan] Add generic resizing hashtable

Tue Aug 2 12:15:21 PDT 2016

bruening added inline comments.

================
Comment at: lib/esan/esan_hashtable.h:24
@@ +23,3 @@
+// KeyTy must have an operator== and an operator size_t() (for hashing).
+// Pointer types will be automatically de-referenced prior to either operator==
+// or a size_t hash cast.
----------------
aizatsky wrote:
> bruening wrote:
> > aizatsky wrote:
> > > aizatsky wrote:
> > > > Key pointer types...
> > > I actually wonder if this is a functionality you want to build in rather then provide through type traits.
> > > 
> > > Like this:
> > > 
> > > template <typename KeyTy, typename DataTy, typename KeyTraits = DefaultTraits<KeyTy>>
> > > 
> > > And later something like:
> > > 
> > > KeyTraits::hash(key)
> > > KeyTraits::eq(key1, key2).
> > > 
> > Could you elaborate on what the pros and cons of your proposal are?  I assume you mean that we define DefaultTraits to invoke operator== and operator size_t() and the point here is to let the user separate hashing from casting and key equality from some other equality -- though the existing constructor takes in an optional hash function for that purpose, and I'm not sure equality would need to be separated.
> Pros:
> 
> - makes HashTable simpler
> - moves type-based decision into a much-much simpler class 
> - type-based decision can be made explicit. Consider:
> 
>   HashTable<int*, int> vs HashTable<int*, int, DerefPointers>
> 
> I would also suggest to move HashFunc to the trait as well. This would improve performance of the hashing.
> 
> The only const that I see is that you need to define a default trait struct. But that looks more like a pro to me :)
I agree that the table would be improved to move the hash function and comparison function into template parameters, but I'd prefer to not combine them, which forces a user who wants to set just one to define both.

I still like a default that de-references pointers, but after removing the payload-freeing function parameter and relying on the user using a smart pointer to free, and considering that in C++ the payload is easily copied into the hashtable entry structure via templates (unlike C where indirection via a pointer into the heap is the typical solution), which means that the payload is less likely to be a pointer, and references means that the key is also less likely to be a pointer (yes still more of a C programmer than a C++ programmer), I'm fine removing it.

================
Comment at: lib/esan/esan_hashtable.h:28
@@ +27,3 @@
+// synchronization for payloads once hashtable functions return.  If
+// ExternalLock is set to true, the caller should call the lock() and unlock()
+// routines around all hashtable operations and subsequent manipulation of
----------------
aizatsky wrote:
> Again, I don't like that there's a conditional branch throughout the class based on the field.
> 
> The most logical way to me would be to define HashTable without any synchronization (i.e. ExternalLock = true), and create a SynchronizedHashTable : public HashTable with corresponding overrides.
The problem is that inheritance means we pay virtual dispatch costs, while a container (i.e., SynchronizedHashTable contains a HashTable inside it) just means a lot of duplicated boilerplate to wrap everything.  I instead made the ExternalLock a template parameter and thus constant at compile time, which eliminates the conditional branches.

https://reviews.llvm.org/D22681