[LLVMbugs] [Bug 10971] Due to bad hash functions, unordered_set can easily give horrible performance
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sun Dec 4 16:08:52 PST 2011
http://llvm.org/bugs/show_bug.cgi?id=10971
Howard Hinnant <hhinnant at apple.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
--- Comment #2 from Howard Hinnant <hhinnant at apple.com> 2011-12-04 18:08:52 CST ---
I've decided to close this as "behaves correctly". It wasn't without a lot of
thought and testing though.
A colleague I highly respect (Dave Zarzycki) summed it up this way:
As for std::hash<int>, the problem that pr10971 is asking
you to try and solve is essentially this: "magically figure
out what the high entropy bits are in a random int and
somehow magically redistribute that entropy to the low
entropy bits in the int"
Note that this decision only impacts the hash<scalar> where sizeof(scalar) <=
sizeof(size_t).
I ran lots of tests measuring collisions with many different distributions of
size_t. The only one I came across that had bad performance was if the key was
a linear with a slope of a prime number: And not just any prime, but the prime
corresponding to the number of buckets. And even then, the prime had to be
fairly large for there to be a noticeable impact. And as soon as the table
automatically rehashes, the problem not only disappears, the identity function
reverts to a perfect hash.
The other hash functions I tested, did not suffer the "multiple of a prime"
problem. But they were consistently slower and resulted in more collisions
much more often. I'm not saying that identity is the perfect hashing function.
But I am saying that it is a good default for containers which restrict the
number of buckets to a prime. Had I been dealing with containers where the
number of buckets is a power of 2, I'm sure I would've come to a different
conclusion. Indeed, this was a big influence in pushing me away from such hash
container implementations years ago.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list