[PATCH] D76977: [lld-macho] Implement basic export trie

Wed Apr 22 05:22:30 PDT 2020

int3 marked an inline comment as done.
int3 added inline comments.

================
Comment at: lld/MachO/ExportTrie.cpp:75
+
+void TrieNode::addSymbol(const Symbol &entry,
+                         std::vector<TrieNode *> &allNodes) {
----------------
int3 wrote:
> int3 wrote:
> > ruiu wrote:
> > > smeenai wrote:
> > > > ruiu wrote:
> > > > > ruiu wrote:
> > > > > > int3 wrote:
> > > > > > > ruiu wrote:
> > > > > > > > `addSymbol` seem to keep the internal trie consistent all the time, but I don't think we need to split the work between `addSymbol` and `build`. We can make `addSymbol` just to store an argument to an array, and move the code for trie construction to `build`.
> > > > > > > Not sure I understand what you're going for. Do you mean to have TrieBuilder::addSymbol collect the symbols to export in a vector, and then have `build()` call `TrieNode::addSymbol()` for all the symbols right before serializing the trie?
> > > > > > Yes, that's what I meant.
> > > > > What I wanted to say is that it looks like the trie implemented in this file is designed with the generic Map use cases in mind. You can add a new string one at a time to an existing trie, and you can also look it up as if it were a map (it is actually a map). In order to that, we create a trie in memory. But I think we can avoid that. We can just accumulate added strings to a vector, and in build() we sort the strings and directly emit a serialized trie from the sorted strings.
> > > > +1 – we should only need the trie for output purposes here.
> > > It is probably better to rewrite the code instead of borrowing it from the existing code. I believe it shouldn't take too much time, and it might even be a bit of fun to write from scratch.
> > > We can just accumulate added strings to a vector, and in build() we sort the strings and directly emit a serialized trie from the sorted strings.
> > 
> > Ah, I see what you mean. That would definitely be more memory/allocation-efficient. I guess we should do a radix sort instead of a simple std::sort so we don't compare prefixes needlessly.
> oh I see llvm-mc has a radix sort implementation in `StringTableBuilder.cpp`. Might steal that
Btw, the current implementation still builds a trie explicitly in memory -- I think we need that in order to iteratively calculate the uleb offsets. But now the trie is built via sorting rather than repeated insertion. Compared to repeatedly inserting symbols, this approach is better because we don't have to repeatedly update / split nodes.

Let me know if this was roughly what you had in mind, or if I'm missing some way to avoid having the trie in memory altogether...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76977/new/

https://reviews.llvm.org/D76977