[LLVMbugs] [Bug 10971] Due to bad hash functions, unordered_set can easily give horrible performance

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Sun Dec 4 16:08:52 PST 2011


http://llvm.org/bugs/show_bug.cgi?id=10971

Howard Hinnant <hhinnant at apple.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX

--- Comment #2 from Howard Hinnant <hhinnant at apple.com> 2011-12-04 18:08:52 CST ---
I've decided to close this as "behaves correctly".  It wasn't without a lot of
thought and testing though.

A colleague I highly respect (Dave Zarzycki) summed it up this way:

As for std::hash<int>, the problem that pr10971 is asking
you to try and solve is essentially this: "magically figure
out what the high entropy bits are in a random int and
somehow magically redistribute that entropy to the low
entropy bits in the int"

Note that this decision only impacts the hash<scalar> where sizeof(scalar) <=
sizeof(size_t).

I ran lots of tests measuring collisions with many different distributions of
size_t.  The only one I came across that had bad performance was if the key was
a linear with a slope of a prime number:  And not just any prime, but the prime
corresponding to the number of buckets.  And even then, the prime had to be
fairly large for there to be a noticeable impact.  And as soon as the table
automatically rehashes, the problem not only disappears, the identity function
reverts to a perfect hash.

The other hash functions I tested, did not suffer the "multiple of a prime"
problem.  But they were consistently slower and resulted in more collisions
much more often.  I'm not saying that identity is the perfect hashing function.
 But I am saying that it is a good default for containers which restrict the
number of buckets to a prime.  Had I been dealing with containers where the
number of buckets is a power of 2, I'm sure I would've come to a different
conclusion.  Indeed, this was a big influence in pushing me away from such hash
container implementations years ago.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list