[llvm-dev] Default hashing function for integers (DenseMapInfo.h)

Joerg Sonnenberger via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 10 02:56:23 PST 2017


On Tue, Jan 10, 2017 at 08:46:54AM +0100, Francois Fayard via llvm-dev wrote:
> I am sorry to insist on that, but I don’t see any advantage of using
> hash(k) = c * k with c being a primer number. What is important is
> choosing c such that gcd(c, m) = 1 for any size m of the hash table.
> This is the case for c = 1. As a consequence, I don’t see any benefit
> in using c = 37 over c = 1. Moreover, all those hash functions are
> badly considered for a reason : they behave very poorly when the keys
> are of the form d * k0 + k1.

The reason for avoiding c=1 is the trivial aliasing set for power-of-two
differences. Those tend to be fairly common in any data set. Beside
that, 37 works somewhat decent for many small differences, it can be
computed easily without multiplication where necessary and noone has
bothered enough to compare different hash functions across different
architectures. E.g. the current hash is simple, avoids some of the
obvious bad cases and is itself decently fast. Anything else needs
careful analysis of the data.

Joerg


More information about the llvm-dev mailing list