<div class="gmail_quote">On Tue, Feb 14, 2012 at 2:44 AM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com">clattner@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'm contradicting my stance above about not caring about the implementation :), but is MurmurHash a good hash for string data? The Bernstein hash function works really well and is much cheaper to compute than Murmur. It is used by HashString (and thus by StringMap).</blockquote>
</div><br><div>If you want a good string hashing function, CityHash is by a fair margin the best one out there. Look at the comparison done by Craig, Howard, and several others when discussing what hashing function to use for libc++.</div>
<div><br></div><div>The only downside to CityHash is code size. It is heavily tuned, and that results in several special case routines to get maximal efficiency and hash quality for short strings (yep, not just huge ones). That said, the code size increase was measured carefully for libc++ and it's really quite small.</div>
<div><br></div><div>That said, I have no benchmarks showing this matters for our uses of StringMap. It reduced collisions, it didn't show up as a hot function, but the collisions and the hashing simply didn't dominate any profiles I looked at....</div>