[PATCH] D133715: [ADT] Add HashMappedTrie

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 13:56:49 PST 2022


dblaikie added a comment.

In D133715#3980166 <https://reviews.llvm.org/D133715#3980166>, @dexonsmith wrote:

> [Resigning as reviewer, since I was the original author, but feel free ping me explicitly if I can be helpful for something and I'm not volunteering anything...]
>
> In D133715#3980058 <https://reviews.llvm.org/D133715#3980058>, @dblaikie wrote:
>
>> Not sure I'll be the most detailed reviewer here, if anyone else wants to chime in.
>>
>> I've a generally understanding of what a trie is, but at my current level of detail looking at this code it's not entirely clear to me what this particular data structure is - maybe some more comments both in the source and in the patch description? simple example usage (in a test and/or in those comments/patch description) migth be helpful
>
> Yeah, probably more docs here would be good. Some notes that Steven might be able to use as a starting point (although feel free to ignore this an do your own thing!):
>
> At a high-level, this is a set/map data structure that supports fast thread-safe insertion/lookup and does not support erase. (IIRC, the interface similar to a `std::map`, where the key is a hash?)
>
> A hash-mapped trie is a tree of arrays where the prefix of the hash is consumed as an "index" into the array at each level of the tree. IIRC, this data structure as implemented:
>
> - is lock-free and thread-safe
> - supports "insert" and "lookup"
> - does NOT detect a hash collision; first insertion wins; you want a strong-enough hash for your data set that there are no collisions, or handle collisions in the `value_type` somehow
> - does NOT support "erase"; IIRC, that'd be hard to do in a thread-safe way, but I didn't think hard about it
> - does NOT support "iteration"; IIRC, it'd be easy to implement iteration that exposed the internal layout
> - array sizes are configurable (root array can be a different size from sub-trie arrays)
> - each slot in an array is either empty, a sub-trie, or a value
>
> Insertion (and lookup) works basically like the following (skipping over details that make the lock-free-thead-safety work):
>
> 1. start with the root trie's array
> 2. convert a prefix of the hash into an index into the current array
> 3. if the slot is a value with the same hash, return the existing value
> 4. if the slot is a value with a different hash, create a new sub-trie that contains the existing value, put the sub-trie in the slot, and continue with step (5)
> 5. if the slot is a sub-trie, descend and go back to step (2) with the unused suffix of the hash on the sub-trie's array
> 6. if the slot is empty, insert and return the new value

Oh, this is all really useful info I didn't understand from the code - I'd assumed it was a user-visible trie (eg: something you could insert entries into, then walk the trie) - if the trie-ness is purely an implementation detail, maybe it shouldn't be tested/dumped? (like we don't test the layout of hash buckets in DenseMap tests) if it does need to be tested, maybe having a test friend in some way.

As for naming, given the functionality. I'd consider putting `Trie` earlier in the name rather than later, same as `DenseMap` - it's not a `trie` data structure, it's a `Set` data structure implemented using a trie. So TrieSet. Though adding all the features might be a bit of a mouthful - `ThreadSafeHashedTrieSet`?... Not sure what the right answer is here (two hard problems in computer science, amirite?).

Is it a set or a map?

Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). `PerfectHashMap`? (if we were going verbose, `ThreadSafePerfectHashMap`... )


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133715/new/

https://reviews.llvm.org/D133715



More information about the llvm-commits mailing list