[PATCH] D122974: prevent ConstString from calling djbHash() more than once

Fri Apr 29 07:35:54 PDT 2022

labath added a comment.

In D122974#3482540 <https://reviews.llvm.org/D122974#3482540>, @llunak wrote:

> In D122974#3481852 <https://reviews.llvm.org/D122974#3481852>, @labath wrote:
>
>> In D122974#3481269 <https://reviews.llvm.org/D122974#3481269>, @llunak wrote:
>>
>>> But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?
>>
>> I think that David's point was that you would use a (probably static) StringMap method to produce the hash object, and then you use the hash object to select the correct map, and also to insert the key in the map. Something like:
>
> ...
>
>> That should only produce one hash computation, right?
>
> Yes, it would, but that's not my point. If StringMap insists on computing the hash itself and doesn't allow outside code to provide it, then that prevents the possible LLDB optimization saving the remaing 10-15% startup CPU cost by not computing the hash at all. If StringMap allows that, then this HashedStringRef idea can be easily circumvented by passing the hash directly, so it seems rather pointless. And the relevant LLDB code from this change is so small that creating HashedStringRef just for that seems like an overkill.

Interesting. I don't know if I missed this somewhere, but could explain what kind of a map operation can lldb perform without computing the hash at least once?

>>> (*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.
>>
>> I think that's because it's trying to produce a value that is not correlated to the value used inside the individual objects (so that you don't overload some buckets of the maps while keeping others empty). However, there are other ways to achieve that. Since StringMap is always using a power of two for the number of buckets, I think it would be sufficient to use an odd number of StringMap instances (say 255). Using the modulo operation to select the StringMap should select spread out the complete hashes evenly among to individual maps.
>
> By different seed I meant that DWARF and StringMap use different seed for DJB hash. And that's only because LLVM code historically used the worse 0 seed and some tests rely on that (7b319df29bf4ebe690ca0c41761e46d8b0081293 <https://reviews.llvm.org/rG7b319df29bf4ebe690ca0c41761e46d8b0081293>). But it seems that LLDB doesn't even read .debug_names entries into memory, so this having anything to do with DWARF debuginfo is actually moot, it'd only matter for LLDB index cache, which could use any hash it wants.
>
> But you're possibly talking about something unrelated here.

Yeah, I thought that the we used a different seed (or hash?) when computing the StringMap index -- I believe that was the case at some point, but I guess it was changed since then...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122974/new/

https://reviews.llvm.org/D122974