[llvm-dev] Default hashing function for integers (DenseMapInfo.h)

Francois Fayard via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 10 05:20:59 PST 2017


> On Jan 10, 2017, at 9:36 AM, Bruce Hoult <bruce at hoult.org> wrote:
> 
> Both are not very sophisticated.
> You should also look at the different MurmurHash versions, and descendants such as CityHash.

I did a few benchmark this morning, trying to tweak the hashing for pointers (as many people seem to use pointers as keys). The hash function in LLVM is quite simple, but it gives very good results. The benchmark I did was compiling a C++ file with clang. I did 10 tests for every version of clang/llvm compiled with the hash function, and here are the best timings:

- Original LLVM: hash(k) = (k >> 4) ^ (k >> 9)
  3.40s
- Stupid hash: hash(k) = k
  3.55s
- A bit less stupid: hash(k) = k >> unused_bits (defined by sizeof(T) = 2^unused_bits)
  3.47s
- Murmurhash3
  3.47s

So different hashing functions make a difference. For pointers, it seems that the one used by LLVM is quite good. It gives the best performance here. It is quite surprising that Murmurhash3 does not behave better than the “A bit less stupid” hash. My guess is that the LLVM hash function is the best because it is tailored to the distribution of pointers returned by malloc. As I said, the (k << 4) seems to be related to the fact that pointers are 16-bytes aligned on most OS. I still don’t understand the 9.

I’ll do some benchmarks for hashing integers tomorrow. This is the hashing that really looks suspicious to me but it does not seem to be used that much in LLVM.

François Fayard


More information about the llvm-dev mailing list