[llvm] [ADT] Update hash function of uint64_t for DenseMap (PR #95734)

Fangrui Song via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 19 12:40:26 PDT 2024


MaskRay wrote:

Thanks for these pointers!

`DenseMap` extracts low bits from a 32-bit `getHashValue`.
(It would probably be nice to switch to a 64-bit hash, perhaps with a new member function.)
This limits the effectiveness of pure multiplicative hashing. We need one xorshift step, which is done in #95970.

A lot of work can be done to both `Hashing.h` and `DenseMap`.
For example, we could still do a better job at discouraging reliance on the iteration order of DenseMap.
While LLVM_ENABLE_REVERSE_ITERATION helps, I had to fix 3 uses cases in llvm/ and clang/ to change `getHashValue` for std::pair.

Incorporating `Hashing.h` into `DenseMap` and switching to `hash_value(42)` or `hash_combine(42, 43)` would mix bits in a better way, but increase the code size and cause some slowdown without clear benefits.

---

I have read some code of carbon-lang/common/hashing.h and absl/hash for integer types and std::pair.
For integer types <= 8 bytes,

* `llvm/include/llvm/ADT/Hashing.h` uses a mxmxm variant `hash_16_bytes` (Murmur-inspired) that has larger latency and probably better avalanche behavior (though likely unnecessarily "strong").
* absl uses a multiply-xorshift `Mix` and uint128 on 64-bit pointer machines.
* Carbon uses a multiply-bswap `WeakMix` using `unsigned _BitInt(128)`.

---

Waiting for Hashing.h and DenseMap improvement would take too long.
To address the immediate needs, **this patch might leverage `densemap::detail::mix` for DenseMapInfo `unsigned long` and `unsigned long long` specializations. @ChuanqiXu9

---

When we are ready to switch more stuff to Carbon style hashing, we can probably use the following multiplication fallback for non-GCC-non-Clang compilers.

```cpp
std::pair<uint64_t, uint64_t> mul64(uint64_t a, uint64_t b) {
  uint64_t a0 = a & 0xffffffff, a1 = a >> 32;
  uint64_t b0 = b & 0xffffffff, b1 = b >> 32;
  uint64_t t = a0 * b0;
  uint64_t u = t & 0xffffffff;
  t = a1 * b0 + (t >> 32);
  uint64_t v = t >> 32;
  t = (a0 * b1) + (t & 0xffffffff);
  return {(t << 32) + u, a1 * b1 + v + (t >> 32)};
}
```


https://github.com/llvm/llvm-project/pull/95734


More information about the llvm-commits mailing list