[llvm-dev] Default hashing function for integers (DenseMapInfo.h)

Tue Jan 10 00:36:02 PST 2017

Both are not very sophisticated.

You should also look at the different MurmurHash versions, and descendants
such as CityHash.

They are only very slightly more complex (e.g. two multiplies and a rotate)
but of extremely good quality, approaching MD5 or SHA for randomness and
independence of resulting bits (but not secure against malicious attacks
like they are).

I've successfully used simply k (i.e. c =1) for both integers and pointers
in hash tables with prime number sizes. It works fine, and they usually
have non-overlapping ranges in practice. Hashing structures and strings is
the more interesting challenge.

But if you're using power of two (or at least non-prime) table sizes then
you should munch up the bits a lot more with multipliers with a lot more
bits, not just something like 37.

It may be that these hash tables don't have many entries and don't have
many collisions and 37 is fine, and a more expensive hash function would
slow it down. Only testing can tell you.

On Tue, Jan 10, 2017 at 11:19 AM, Francois Fayard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> > On Jan 10, 2017, at 8:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote
> >
> > Some tests I can suggest is to replace the hash function with your
> favorite and:
>
> Thanks. I’ll give it a try. It will take some time as I need to rewrite
> DenseMap if I want to use the Knuth multiplicative hash.
>
> > ultimately I believe real-world impact is the best way to get a change
> in.
>
> That’s exactly what I think. I was just trying to get some hints on what
> parts of the code could benefit from a better hash function for integers,
> in order to find a benchmark which is relevant.
>
> > If this is used for having pointers, then it is likely gonna be used
> *everywhere*.
> > Even simple test like “time to deserialize a large bitcode file” would
> be a good start.
>
>
> It is not used for pointers, as this function would be truly awful for
> them: because of alignments, pointers are of the form 2^a * k0. The hash
> function used for pointers is (k >> 4) ^ (k >> 9). My guess is that the k
> >> 4 is here because pointers returned by malloc are 16 bytes aligned. I
> don’t have any guess for the 9 though.
> So the hashing function for pointers is not flawed like the one for
> integers.
>
> François Fayard
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170110/3b98b529/attachment.html>