[PATCH] D133715: [ADT] Add HashMappedTrie

Steven Wu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 5 14:01:04 PST 2023


steven_wu added a comment.

> The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).

Correct. The current dump method is not designed to dump user-visible state. The goal is to dump how hash/key is stored in trie, and it doesn't even dump any user stored value. To dump user visible state from DenseMap, it needs to support iteration first but HashMappedTrie currently doesn't support iteration (and the CAS/ObjectStore it implements cannot have iteration for security reason). The only useful dump method can be implemented without iteration is what it has now. I am ok to remove dump method from this patch since I already rewrite the tests without using dump. The string dump for the trie structure can be useful to debug full CAS in the future. Since CAS cannot have iteration, a separate tool that can dump the entire trie structure can be helpful to look inside CAS. In that case, `dump` is a public API that needs to be used by this CAS inspection tool, not a private method that expose to test only.

> Can the unit test be a friend to the data structure and use the data structure's internal APIs?

It is already the case now. I find a way to write unit-test with no string comparison and extract data from the trie without supporting real iteration. Supporting full iteration is not trivial since it will either involve a very complicated iterator or adding back edge in the trie or both. Currently, it has a very non-trivial way to iterate the subtrie in an order that is not really interesting to users, and the way it finds the hash prefix is quite expensive too. Thus, this is only useful for unit-tests and they are using private APIs that are friends to test only. A real iteration needs to do a tree walk from the root and store information it acquires along the way (like `dump` method).

> I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name.
>
> I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc?

I don't have candidates. `TrieHashMap` sounds good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133715/new/

https://reviews.llvm.org/D133715



More information about the llvm-commits mailing list