[PATCH] D122974: prevent ConstString from calling djbHash() more than once
David Blaikie via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 28 11:28:18 PDT 2022
dblaikie added a comment.
In D122974#3424654 <https://reviews.llvm.org/D122974#3424654>, @JDevlieghere wrote:
> The ConstString/StringPool is pretty hot so I'm very excited about making it faster.
>
> I'm slightly concerned about the two hash values (the "full" hash vs the single byte one). That's not something that was introduced in this patch, but passing it around adds an opportunity to get it wrong.
>
> I'm wondering if we could wrap this in a struct and pass that around:
>
> struct HashedStringRef {
> const llvm::StringRef str;
> const unsigned full_hash;
> const uint8_t hash;
> HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
> }
>
> It looks like we always need both except in the StringMap implementation, but if I'm reading the code correctly we'll have constructed it in ConstString already anyway?
I'm sort of with you here - I don't think this API feature needs to affect lldb much - the scope of these externally-computed hashes is small, but the StringMap API could be more robust/less error prone by having a struct like this. Specifically `StringMap::hash` could/should return an immutable struct that contains the `StringRef` and the hash (not that that's bulletproof - the data beneath the `StringRef` could still be modified externally - but the same is true of any hashing data structure - the keys might be shallow or have mutable state, etc) - and that immutable object must be passed back to the hash-providing insertion functions. (these functions could/should probably still then assert that the hash is correct in case of that underlying mutation - though someone could argue that's overkill and I'd be open to having that discussion).
By immutable I mean that the caller can't go and modify either the `StringRef` or hash to make these out of sync. These handles are probably still copy assignable. The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call `StringMap` to produce one.
================
Comment at: lldb/source/Utility/ConstString.cpp:107-109
llvm::sys::SmartScopedReader<false> rlock(m_string_pools[h].m_mutex);
- auto it = m_string_pools[h].m_string_map.find(string_ref);
+ auto it = m_string_pools[h].m_string_map.find(string_ref, string_hash);
if (it != m_string_pools[h].m_string_map.end())
----------------
total aside, but it might be nice to refactor `m_string_pools[h]` out into a named variable here - accessing it 3 times and in a context where it's super important that it's the same thing in all 3 places, seems like it'd be easier to read with a named variable - might make it clear which things need to happen under the lock and which ones don't, etc.
(oh, and the 4th and 5th use a few lines down - and I think that'd be the only uses of `h`, so `h` could go away and `hash` could be called inline in the array index instead, maybe?)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D122974/new/
https://reviews.llvm.org/D122974
More information about the llvm-commits
mailing list