[llvm] [LLVM][ADT] Convert llvm::hash_code to unsigned explicitly in DenseMapInfo (PR #77743)

Andrei Golubev via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 12 02:50:32 PST 2024


andrey-golubev wrote:

So from what I understand, there's:
a) a bunch of functionality around `hash_code`. we're mainly interested in `hash_value` that seems to produce actual 64-bit values (e.g. it actually utilizes both low and high 32 bits in integer hashing) - does "proper" hashing to begin with.
b) `DenseMapInfo<>` that seems to deal with just keys into DenseMap. For simpler cases (e.g. `unsigned long long` and friends) it, in a nutshell, performs a 1-to-1 mapping by multiplying the given value by some constant (this might potentially overflow but we don't care for unsigned) so that we get nice hashing that is also cheap to compute? (But then we trim half of the bytes before returning the hash?)
c) Then, for `DenseMapInfo<hash_code>` the current behavior just takes "first" `sizeof(unsigned)` bytes.

Now,

> But I think the right solution would be to do the same thing as is done for `unsigned long long` - maybe even just have the size_t specialization delegate to the unsigned long long implementation (maybe only if sizeo(size_t) > sizeof(unsigned)?)?

this would essentially mean smth like:
```
  // constexpr auto Constant = 37ULL;
  static unsigned getHashValue(hash_code val) { return (unsigned)(size_t(val) * Constant); }
```
my gut feeling is that we don't really need constant multiplication: given that the hash_code was computed using some proper hashing mechanism, multiplying by a constant does produce equivalent value distribution? (it's the original but just "shifted" by that multiplication?). BUT: this is valid if we don't "trim" the bytes afterwards, which we do by taking only `unsigned` part.

So the bulk of the discussion is how `* 37` affects the end result and whether we care.

https://github.com/llvm/llvm-project/pull/77743


More information about the llvm-commits mailing list