[PATCH] D133715: [ADT] Add HashMappedTrie
David Blaikie via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 14:43:47 PST 2022
dblaikie added a comment.
In D133715#3993270 <https://reviews.llvm.org/D133715#3993270>, @dexonsmith wrote:
> In D133715#3993178 <https://reviews.llvm.org/D133715#3993178>, @dblaikie wrote:
>
>> Is it a set or a map?
>
> The interface is a map, just that the key type is restricted to look like a hash, and the client is expected to ensure the keys are well-distributed.
Hmm, right.
>> Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). `PerfectHashMap`? (if we were going verbose, `ThreadSafePerfectHashMap`... )
>
> Both names make sense to me. "This is a map from a perfect hash to the data."
>
> Issues I see with those names:
>
> - I worry it's not obvious from either name *when* you'd want to use this. (The answer is, I think, you want this if you are building a set shared between multiple threads, you expect lots of concurrent lookup/insertion, and you want fast insertion/lookup nevertheless.)
I think given the number of things we're trying to capture in the name, and the nuance here - not sure we can capture that in the name (even if that was the only thing we wanted to capture in the name I can't think of a punchy name - `InsertOnlyFastConcurrentHashMap`? :/)
> - It might sound like the data structure does hashing. (It doesn't. The client is expected to provide the hash.)
Yeah, fair - I guess maybe that tends towards `ConcurrentPerfectHashMap` (open to `Concurrent` or `ThreadSafe`)
> - It steals the turf for more general map built on top of this. More later.
>
> Regardless of the name, maybe the programmer's guide could use an explainer.
Might be this is esoteric enough not to be worth Programmer's Manual space, but certainly some good doc comments - but either/both I'm OK with.
> Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.
>
> - Maybe it would make sense to lift up? Name could be PerfectHashSet
> - You might want to build a map that also handles the hashing; name could potentially be PerfectHashMap
>
> Maybe this could be PerfectHashSetImpl?
>
> - Has the "guts" for implementing a client-friendly PerfectHashSet or PerfectHashMap?
> - ... but can be used directly if you want to manage the hashing yourself?
I take it the current use cases you have in mind/lined up in the CAS use this directly? Maybe a `Raw` prefix? `RawPerfectHashSet`, `RawConcurrentHashSet`, etc... some combination/choice of those sort of things?
================
Comment at: llvm/unittests/ADT/HashMappedTrieTest.cpp:241-245
+using HasherT = SHA1;
+using HashType = decltype(HasherT::hash(std::declval<ArrayRef<uint8_t> &>()));
+template <class T>
+class ThreadSafeHashMappedTrieSet
+ : ThreadSafeHashMappedTrie<T, sizeof(HashType)> {
----------------
dexonsmith wrote:
> FYI, here, buried in the unit tests (demoted due to lack of immediate use cases), this is a set data structure built on top of the trie.
>
> If this were lifted up again for actual use, as opposed to just a test, you'd want to template it on `HasherT` and then use HashBuilder (from llvm/include/llvm/Support/HashBuilder.h) to hash the value.
Hmm, yeah, this looks like overkill for the test - given the data structure as-is, we don't need any hash algorithm here - the test, I would expect, would use unique array values/"hash" values hardcoded/directly.
If you've got a use for this data structure, it could come along in a subsequent patch?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D133715/new/
https://reviews.llvm.org/D133715
More information about the llvm-commits
mailing list