[LLVMdev] FoldingSet #collisions comparison

Gregory Petrosyan gregory.petrosyan at gmail.com
Wed Feb 10 18:22:22 PST 2010


On Thu, Feb 11, 2010 at 03:49:57AM +0300, Gregory Petrosyan wrote:
> First, the main thing: SuperFastHash appears to be the hash with best
> distribution. Use of MurmurHash instead generates 1.28% more collisions while
> doing nightly test in MultiSource/Applications.
> 
> Second: I've also tested lookup3 hash, and its use generated 0.1% more
> collisions, compared to SFH.
> 
> These results were a bit surprising for me!

And now something really surprising, Bernstein hash

	unsigned bernstein_hash(const unsigned char *str, size_t len)
	{
		unsigned hash = 5381;
		unsigned c;

		while (len--) {
			c = *str++;
			hash = (hash * 33) ^ c;
		}

		return hash;
	}

gives 1.48% less (sic!) collisions than SFH! That means less than lookup3, MMH
and FNV, too. I have no idea why this simplest hash is so good for LLVM. Maybe
because LLVM hashes lots of pointers?

But BH is ~2 times slower than SFH.

Overall, it looks like
1) Number of collisions for different hashes is within 5% interval.
2) MMH is the fastest by a big margin, FNV / BH are the slowest by a big margin.
3) Any hash will do for FoldingSet.

	Gregory



More information about the llvm-dev mailing list