[PATCH] D133715: [ADT] Add HashMappedTrie

Wed Dec 14 12:42:10 PST 2022

steven_wu added a comment.

In D133715#3995988 <https://reviews.llvm.org/D133715#3995988>, @dexonsmith wrote:

> In D133715#3995750 <https://reviews.llvm.org/D133715#3995750>, @dblaikie wrote:
>
>> In D133715#3993521 <https://reviews.llvm.org/D133715#3993521>, @dexonsmith wrote:
>>
>>> In D133715#3993489 <https://reviews.llvm.org/D133715#3993489>, @dblaikie wrote:
>>>
>>>>> - a test support friend class that can inspect/test the trie layout without relying on stringification;
>>>>
>>>> This bit I'm not so sure about - we don't test the bucketing behavior of `DenseHashMap` so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?
>>>
>>> IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong.
>>>
>>> I don't remember the bugs, but two examples of subtleties:
>>>
>>> - On an exact match you don't want to "sink" the existing entry down a level to a new sub-trie (you need to detect "exact match" before sinking). Getting this wrong will affect performance but not otherwise be user-visible.
>>> - The deepest sub-trie might be a different/smaller size than the others because there are only a few bits left-over, and the handling needs to be right. It's simpler to check for correct layout directly than to guess about what user-visible effects there might be for errors.
>>>
>>> I'd be a bit uneasy with the layout tests being dropped altogether.
>>>
>>> Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return `false`?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout.
>>
>> Fair enough - if it's sufficient to have a verify operation (maybe "assertValid" - so, yeah, crash when not valid) I'd go with that, but given the argument you've made, if you think verifying the specific structure is significantly more valuable than that, I'd be OK with some private/test-friended introspection API.
>
> IMO it's worth it; @steven_wu, if you disagree, I could live with `assertValid()`. (BTW, I remembered another justification. The test case inputs/setups are a bit subtle; checking the layout ensures you've covered the corner case you think you've covered.)

I prefer to keep those tests as it still provides value and also insights into underlying implementation. If someone got around to do a new and better implementation, we can drop them by then.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133715/new/

https://reviews.llvm.org/D133715