[PATCH] D133715: [ADT] Add HashMappedTrie

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 5 23:10:20 PST 2023


dblaikie added a comment.

In D133715#4029870 <https://reviews.llvm.org/D133715#4029870>, @steven_wu wrote:

>> The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).
>
> Correct. The current dump method is not designed to dump user-visible state. The goal is to dump how hash/key is stored in trie, and it doesn't even dump any user stored value. To dump user visible state from DenseMap, it needs to support iteration first but HashMappedTrie currently doesn't support iteration (and the CAS/ObjectStore it implements cannot have iteration for security reason).

Not sure I follow why it'd need to support iteration to support dumping of user-centric state. Supporting iteration (using an iterator abstraction) is a fair bit more complicated than walking the data structure directly to dump out its contents, I'd think.

> The only useful dump method can be implemented without iteration is what it has now. I am ok to remove dump method from this patch since I already rewrite the tests without using dump. The string dump for the trie structure can be useful to debug full CAS in the future. Since CAS cannot have iteration, a separate tool that can dump the entire trie structure can be helpful to look inside CAS. In that case, `dump` is a public API that needs to be used by this CAS inspection tool, not a private method that expose to test only.

Yeah, I'm generally OK with a public dump method that gives a user-centric result, like DenseMap's dump... oh, my mistake, DenseMap doesn't have a dump method. I guess we rely on debugger pretty printers for that (& as the person who wrote the gdb one at least, it's not great - because it's hard to walk the user-visible state correctly/skipping internal implementation details - so it does end up printing the buckets, etc).

But, sure, doesn't need to be in this patch - I guess if we don't have it for other data structures, seems fair/reasonable/OK that we don't have it for this one.

>> Can the unit test be a friend to the data structure and use the data structure's internal APIs?
>
> It is already the case now. I find a way to write unit-test with no string comparison and extract data from the trie without supporting real iteration. Supporting full iteration is not trivial since it will either involve a very complicated iterator or adding back edge in the trie or both. Currently, it has a very non-trivial way to iterate the subtrie in an order that is not really interesting to users, and the way it finds the hash prefix is quite expensive too. Thus, this is only useful for unit-tests and they are using private APIs that are friends to test only. A real iteration needs to do a tree walk from the root and store information it acquires along the way (like `dump` method).

Ah, OK, great.

>> I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name.
>>
>> I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc?
>
> I don't have candidates. `TrieHashMap` sounds good to me.

Sounds good to me, then.



================
Comment at: llvm/unittests/ADT/HashMappedTrieTest.cpp:18
+namespace llvm {
+class HashMappedTrieTestHelper {
+public:
----------------
Rather than having to indirect everything through this class helper - I think it's possible to name the class of the test fixture itself, and then you could friend that, so the test fixture would have direct access? Might be simpler that way.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133715/new/

https://reviews.llvm.org/D133715



More information about the llvm-commits mailing list