[llvm] [ADT] Update hash function of uint64_t for DenseMap (PR #95734)

Wed Jun 19 18:57:49 PDT 2024

ChuanqiXu9 wrote:

> Thanks for these pointers!
> 
> `DenseMap` extracts low bits from a 32-bit `getHashValue`. (It would probably be nice to switch to a 64-bit hash, perhaps with a new member function.) This limits the effectiveness of pure multiplicative hashing. We need one xorshift step, which is done in #95970.
> 
> A lot of work can be done to both `Hashing.h` and `DenseMap`. For example, we could still do a better job at discouraging reliance on the iteration order of DenseMap. While LLVM_ENABLE_REVERSE_ITERATION helps, I had to fix 3 uses cases in llvm/ and clang/ to change `getHashValue` for std::pair.
> 
> Incorporating `Hashing.h` into `DenseMap` and switching to `hash_value(42)` or `hash_combine(42, 43)` would mix bits in a better way, but increase the code size and cause some slowdown without clear benefits.
> 
> I have read some code of carbon-lang/common/hashing.h and absl/hash for integer types and std::pair. For integer types <= 8 bytes,
> 
> * `llvm/include/llvm/ADT/Hashing.h` uses a mxmxm variant `hash_16_bytes` (Murmur-inspired) that has larger latency and probably better avalanche behavior (though likely unnecessarily "strong").
> * absl uses a multiply-xorshift `Mix` and uint128 on 64-bit pointer machines.
> * Carbon uses a multiply-bswap `WeakMix` using `unsigned _BitInt(128)`.
> 
> Waiting for Hashing.h and DenseMap improvement would take too long. To address the immediate needs, **this patch might leverage `densemap::detail::mix` for DenseMapInfo `unsigned long` and `unsigned long long` specializations. @ChuanqiXu9
> 
> When we are ready to switch more stuff to Carbon style hashing, we can probably use the following multiplication fallback for non-GCC-non-Clang compilers.
> 
> ```c++
> std::pair<uint64_t, uint64_t> mul64(uint64_t a, uint64_t b) {
>   uint64_t a0 = a & 0xffffffff, a1 = a >> 32;
>   uint64_t b0 = b & 0xffffffff, b1 = b >> 32;
>   uint64_t t = a0 * b0;
>   uint64_t u = t & 0xffffffff;
>   t = a1 * b0 + (t >> 32);
>   uint64_t v = t >> 32;
>   t = (a0 * b1) + (t & 0xffffffff);
>   return {(t << 32) + u, a1 * b1 + v + (t >> 32)};
> }
> ```

Thanks for the high quality summary. Looking forward for further improvements!

https://github.com/llvm/llvm-project/pull/95734