[llvm-dev] Default hashing function for integers (DenseMapInfo.h)

Francois Fayard via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 9 06:34:05 PST 2017


> On Jan 9, 2017, at 1:37 PM, Joerg Sonnenberger <joerg at bec.de> wrote:
> 
> 37 is a prime. Multiples of 8 that are otherwise evenly distributed will
> map to different slots in the output space.

As the hash number is reduced modulo 2^p (the number of slots), if your hash function is of the form hash(k) = c * k, I understand that you want to be able to fill all te slots. One can show that we get this property  if c is not a multiple of 2. So it gives a lot of choices: 1, 3, 5, 7, 9, etc. I don’t see any reason not to use c = 1 if you want to use that kind of hash function.

> 64bit multiplications are very expensive on many 32bit platforms. It is
> certainly an order of magnitude slower than the existing hash.

There is also a 32-bit version of the Knuth hash:

std::uint32_t hash(std::uint32_t val, int p) {
  const std::uint32_t knuth = 2654435769;
  const std::uint32_t y = val;
  
  return (y * knuth) >> (32 - p);
}

Those numbers come from the following fact. Let alpha = (-1 + sqrt(5)) / 2 which is about 0.618034. Multiply this number by 2^32 (or 2^64 for 64 bit hash), and round this number to the nearest integer. You’ll get 2 654 435 769. alpha has been chosen so that it is an irrational number in between 0 and 1 and that it is very difficult to approach it by rational numbers. One can show that all irrational numbers being a root of a degree 2 polynom with integer coefficients with leading coefficient being 1 have this property (X^2 + X - 1 = 0 for alpha).

I  wrote my own hash table using open addressing and quadratic probing. I have tried to insert n = 2^26 keys given as follow: generate a random number in between 0 and 2^(64-3) and multiply it by 8. Then search all those keys in the order they were inserted. Here are the timings I get:
- hash(x) = x : insertion 7.42s, searching 4.09s
- hash(x) = 37 * x : insertion 7.49s, searching 5.50s
- Knuth hashing: insertion 6.04, searching 2.54s

So using Knuth hashing is really worth it if the keys happen to be in the form 2^k * u + v. And multiply by 37 is not a good idea here. And I am still very puzzled and don’t really understand the situations where it could be useful.

François Fayard


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170109/ed931d51/attachment.html>


More information about the llvm-dev mailing list