[PATCH] D122974: prevent ConstString from calling djbHash() more than once

Pavel Labath via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 29 00:01:03 PDT 2022


labath added a comment.

In D122974#3481269 <https://reviews.llvm.org/D122974#3481269>, @llunak wrote:

> In D122974#3480686 <https://reviews.llvm.org/D122974#3480686>, @dblaikie wrote:
>
>> In D122974#3424654 <https://reviews.llvm.org/D122974#3424654>, @JDevlieghere wrote:
>>
>>> 
>
> ...
>
>>>   struct HashedStringRef {
>>>     const llvm::StringRef str;
>>>     const unsigned full_hash;
>>>     const uint8_t hash; 
>>>     HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
>>>   }
>
> ...
>
>> The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call `StringMap` to produce one.
>
> But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

I think that David's point was that you would use a (probably static) StringMap method to produce the hash object, and then you use the hash object to select the correct map, and also to insert the key in the map. Something like:

  auto hash_obj = MyStringMap::Hash(key);
  MyStringMap &map = maps[hash_obj.one_char_hash];
  map.insert(has_obj, key, value);

That should only produce one hash computation, right?

> (*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.

I think that's because it's trying to produce a value that is not correlated to the value used inside the individual objects (so that you don't overload some buckets of the maps while keeping others empty). However, there are other ways to achieve that. Since StringMap is always using a power of two for the number of buckets, I think it would be sufficient to use an odd number of StringMap instances (say 255). Using the modulo operation to select the StringMap should select spread out the complete hashes evenly among to individual maps.

And it would mean that the StringMap does not have to concern itself with the one-char hashes (which definitely seems like a layering violation).

>> (these functions could/should probably still then assert that the hash is correct in case of that underlying mutation - though someone could argue that's overkill and I'd be open to having that discussion).
>
> I'd prefer to have that discussion. If your argument is perfection, then let's do the full monty.

I guess we could put those checks under `#ifdef EXPENSIVE_CHECKS` (?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122974/new/

https://reviews.llvm.org/D122974



More information about the llvm-commits mailing list