[LLVMdev] FoldingSet #collisions comparison
Gregory Petrosyan
gregory.petrosyan at gmail.com
Wed Feb 10 18:22:22 PST 2010
On Thu, Feb 11, 2010 at 03:49:57AM +0300, Gregory Petrosyan wrote:
> First, the main thing: SuperFastHash appears to be the hash with best
> distribution. Use of MurmurHash instead generates 1.28% more collisions while
> doing nightly test in MultiSource/Applications.
>
> Second: I've also tested lookup3 hash, and its use generated 0.1% more
> collisions, compared to SFH.
>
> These results were a bit surprising for me!
And now something really surprising, Bernstein hash
unsigned bernstein_hash(const unsigned char *str, size_t len)
{
unsigned hash = 5381;
unsigned c;
while (len--) {
c = *str++;
hash = (hash * 33) ^ c;
}
return hash;
}
gives 1.48% less (sic!) collisions than SFH! That means less than lookup3, MMH
and FNV, too. I have no idea why this simplest hash is so good for LLVM. Maybe
because LLVM hashes lots of pointers?
But BH is ~2 times slower than SFH.
Overall, it looks like
1) Number of collisions for different hashes is within 5% interval.
2) MMH is the fastest by a big margin, FNV / BH are the slowest by a big margin.
3) Any hash will do for FoldingSet.
Gregory
More information about the llvm-dev
mailing list