[LLVMdev] FoldingSet #collisions comparison

Wed Feb 10 16:54:36 PST 2010

On Feb 10, 2010, at 4:49 PM, Gregory Petrosyan wrote:

>> 
>> These numbers are so noisy, that they aren't particularly useful.
>> Could you try instrumenting foldingset to keep track track of the #
>> collisions and # hash table resizes and compare those?  They should
>> be much more stable and still correlate directly to performance.
> 
> OK, now with real numbers :-)
> 
> First, the main thing: SuperFastHash appears to be the hash with best
> distribution. Use of MurmurHash instead generates 1.28% more collisions while
> doing nightly test in MultiSource/Applications.
> 
> Second: I've also tested lookup3 hash, and its use generated 0.1% more
> collisions, compared to SFH.
> 
> These results were a bit surprising for me!
> 
> Number of hash table resizes is independent of used hashing algorithm, because
> hash table grows when 'nentries > nbuckets * 2'.

Thanks for doing the evaluation!  It sounds like we should stay with SFH.  In addition to fewer collisions, it is cheaper to compute.

-Chris