[LLVMdev] We need better hashing
johnso87 at crhc.illinois.edu
Fri Feb 17 12:11:32 PST 2012
On Fri, Feb 17, 2012 at 3:59 AM, Chandler Carruth<chandlerc at google.com> wrote:
> On Tue, Feb 14, 2012 at 2:44 AM, Chris Lattner<clattner at apple.com <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>> wrote:
> >/ I'm contradicting my stance above about not caring about the
> />/ implementation :), but is MurmurHash a good hash for string data?
> />/ The Bernstein hash function works really well and is much cheaper to
> />/ compute than Murmur. It is used by HashString (and thus by StringMap).
> If you want a good string hashing function, CityHash is by a fair margin
> the best one out there. Look at the comparison done by Craig, Howard, and
> several others when discussing what hashing function to use for libc++.
> The only downside to CityHash is code size. It is heavily tuned, and that
> results in several special case routines to get maximal efficiency and hash
> quality for short strings (yep, not just huge ones). That said, the code
> size increase was measured carefully for libc++ and it's really quite small.
What about machines that are big endian, 32-bit, or less optimized for unaligned
accesses than x86? From
> 1) The current version of CityHash is intended for little-endian 64-bit CPUs.
> Functions that don't use the CRC32 instruction should also work, slowly,
> in little-endian 32-bit code. CityHash should work on big-endian CPUs as well;
I ported CityHash about a year ago to a niche research architecture that is all
three of the above, and found that the byteswapping and unaligned loads
killed performance (the target only supports naturally aligned loads).
It's quite possible that x86 is the only target we care about in terms of
compile time performance, just wanted to see if CityHash had since been generalized
to work well on less general-purpose-focused targets, or if there was
something else I was overlooking when I did my initial experiments.
> That said, I have no benchmarks showing this matters for our uses of
> StringMap. It reduced collisions, it didn't show up as a hot function, but
> the collisions and the hashing simply didn't dominate any profiles I looked
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev