[llvm] [ADT] Update hash function of uint64_t for DenseMap (PR #95734)

Mon Jun 17 23:11:31 PDT 2024

chandlerc wrote:

> > It would be nice to get someone expert in state-of-the-art hash functions and hash tables to review this.
> 
> @chandlerc -- any chance you could weigh in here?

Happy to, I've been studying this freshly for the past 6 months.

Generally, none of the old multiplication, or the shift are going to work well. But they may get lucky with the current inputs and appear to work well.

Not sure why the concern over ADT/Hashing.h -- that code has held up quite well and remains a reasonably strong balance of strong hashing at reasonable cost.

In particular, for a 64-bit integer, I wouldn't expect it to be much slower than the `combineHashValue` being called in the current iteration. It might actually be faster.

I have recently developed a hashing function that is only slightly worse than ADT/Hashing.h and is very, very competitive (I suspect faster, but the proof will take time to tell) to the very best. It is open source in Carbon and under the LLVM license:
https://github.com/carbon-language/carbon-lang/blob/trunk/common/hashing.h

This routine is dramatically faster than LLVM's Hashing.h, and equal or faster to essentially everything else I've been able to evaluate for small objects (integers, pointers, tuples of those). For long strings there are a few faster approaches using specialized hardware (AES building blocks), but not enough to matter for a compiler I strongly suspect.

The lower quality hashing should largely be fine as long as the hash table load factor is low enough. I've done a good amount of DenseMap benchmarking with that hash function, you can see code that benchmarks it directly here:
https://github.com/carbon-language/carbon-lang/blob/trunk/common/map_benchmark.cpp#L184

Carbon also just got a new hashtable implementation that tries to be as close to DenseMap as I could possibly make it for small tables and very sparse tables (low load factor), while operating with a very high load factor (7/8) and with superb performance on large tables. It's based on SwissTable, and comparable or better performance. If there are hashtables that are struggling with DenseMap's design, that would be what I would suggest. But it is very, very hard to compete with DenseMap -- the performance is fantastic and almost unbeatable for very small tables and low load factors.

https://github.com/llvm/llvm-project/pull/95734