[cfe-dev] RFC: change string hash function in libc++

Chandler Carruth chandlerc at google.com
Sun Dec 4 18:45:43 PST 2011


On Sun, Dec 4, 2011 at 6:43 PM, Howard Hinnant <hhinnant at apple.com> wrote:

> On Dec 4, 2011, at 9:17 PM, Chandler Carruth wrote:
>
> > On Sun, Dec 4, 2011 at 4:12 PM, Howard Hinnant <hhinnant at apple.com>
> wrote:
> >
> > I've been convinced to use murmur2 when I need to combine 2, 3, or 4
> size_t's into a single one, except for the case of long double on x86.  My
> rationale is that long double patterns are not likely to accidentally occur
> often in a long double.  So I'm sticking with xor for long double for now.
>  If I see cases that could easily occur accidentally, this can always be
> changed in the future.
> >
> > I settled on murmur2 instead of murmur3 because of the easy availability
> of the 32 bit and 64 bit versions.
> >
> > I'm confused. What do you mean by 32-bit and 64-bit versions? If you
> could clarify what versions you would like, the author of the murmur hashes
> and some other folks here at google are offering to help build variants
> that target exactly what libc++ needs.
> >
> >  I did not see a 64 bit murmur3, and figured murmur2 is good enough.
>  murmur2 won't actually ever be used when size_t is 64 bits.  It will only
> be activated for (unsigned) long long and double on 32 bit platforms.
> >
> > Oh, except that I activated murmur2 for basic_string as well.
> >
> > Why not murmur3 or city hash? murmur3 is both higher quality and
> significantly faster. city hash is still faster. For reference, the
> original thread was *primarily* focused on basic_string hashing.
> >
> > The tests I ran weren't that convincing.  However I also admit that
> there is a lot I don't know here, and people are asking for murmur.  And
> what I had was operating a character at a time and murmur operates a word
> at a time.   So done.
> >
> > At Google, we have found CityHash64 to provide very significant benefits
> when used to hash strings for unordered containers. If you look at the
> benchmark we posted (SMHasher) it was written specifically to showcase the
> problems hit by real-world hashing of string contents.
>
> The libc++ code is publicly available at
> http://llvm.org/svn/llvm-project/libcxx/trunk/ .  Patches to this code
> are reviewed regularly by myself and others, and welcomed.  Please include
> a patch for http://llvm.org/svn/llvm-project/libcxx/trunk/CREDITS.TXT if
> you contribute.


Yep, I think Craig was just trying to test the waters and make sure you
(and others) would be interested in such a patch before working on it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111204/83f186a2/attachment.html>


More information about the cfe-dev mailing list