[PATCH] D76977: [lld-macho] Implement basic export trie

Tue Apr 28 17:18:10 PDT 2020

smeenai added a comment.

I haven't though about this too hard, but if `-exported_symbols_order` wasn't a thing, we wouldn't have to do the loop to convergence, right? We could just do a post-order traversal, so that the the offset of each child of a node is fixed before you reach that node. (The wrinkle is that the root node needs to be at the start of the trie, since all the offsets are positive, but you could just traverse everything except the root node and calculate their offsets, then calculate the size of the root node, then add that size to all the other offsets.)

I haven't thought very hard about how `-exported_symbols_order` will work in this scheme either, but your idea for that makes sense at a high level. It doesn't seem ld64 handles this super optimally either. For example, if you have symbols `_a` and `_ab`, and you want `_ab` to be ordered in the trie before `_a` (or just only have `_ab` in the exported symbols order file), ld64 produces:

  <root>
    |
    _a
    |
    |-----
    |    |
    *b  *''

(terminal nodes prefixed with a `*`)

whereas without any order file, you would have gotten

  <root>
    |
   *_a
    |
    *b

Both tries take the same number of node traversals to get to `_ab`, but the first one needlessly pessimizes `_a` in the process.

I'd be curious about exactly how dyld processes the export trie, to understand how much of a difference `-exported_symbols_order` might make.

Assuming that my post-order traversal scheme actually works, and assuming that we don't plan to support `-exported_symbols_order` any time soon, would it make sense to just use that scheme for now and figure out `-exported_symbols_order` when we get there? I suspect we'll run into a bunch more considerations when we actually get around to implementing that, so I don't think it's super important to be coding with that in mind right now. (As in, the post-order definitely won't work when we have to respect `-exported_symbols_order`, but we can think about that later.)

================
Comment at: lld/MachO/ExportTrie.cpp:90
+        StringRef bNodeStr = edge.child->cumulativeString;
+        bNodeStr = bNodeStr.drop_back(edgeStr.size() - n).copy(bAlloc);
+        auto *bNode = make<TrieNode>(bNodeStr);
----------------
int3 wrote:
> The original implementation used a unique BumpPtrAllocator instance for the export trie construction, which gets deallocated once trie serialization is complete. Here I am using the global `bAlloc` that doesn't deallocate till the end of the program. Not if it's worth using a local allocator instance to reduce memory consumption
I think LLD generally just allocates everything using the global allocator and deallocates it in one fell swoop at the end (or just lets the OS clean it up), so this should be fine. I'd be curious if anything in LLD COFF or ELF uses local allocators though.

================
Comment at: lld/MachO/ExportTrie.cpp:192-195
+  std::vector<TrieNode *> orderedNodes;
+  orderedNodes.reserve(allNodes.size());
+  for (const Symbol *sym : exported)
+    rootNode->orderNodes(*sym, orderedNodes);
----------------
int3 wrote:
> int3 wrote:
> > smeenai wrote:
> > > int3 wrote:
> > > > This is basically a topological sort. It's unclear to me though why this approach (of finding every element starting from the root) was chosen vs doing a single preorder traversal of the trie...
> > > Not having looked at this in detail yet, would they give comparable results?
> > I think the only difference would be the order (and therefore the offsets) of the serialized nodes. Might make things more or less compact, it's unclear to me...
> Aha, I see that `ld64` has an `-exported_symbols_order` flag, so the export trie "can be optimized to make lookups of popular symbols faster". The implementation also secondarily sorts the trie by address. I guess that means dyld loads this trie lazily. I'll add a TODO to support this...
> 
> I think we can still implement it more efficiently than looking up every element from the root: While building the trie, we should keep track of the lowest user-ordered leaf that it contains. Then we can sort the nodes based on that.
Yeah, `-exported_symbols_order` flag is a wrinkle :/ What you said makes sense for when we have to support that though.

================
Comment at: lld/MachO/ExportTrie.cpp:15
+//
+//             +--+--+
+//             | _ba |
----------------
>From what I understand of the trie structure, the strings for children are actually stored in the parent. This is interesting by itself, and it also implies that the root node is always the empty string. I think it's worth making a note of that and reflecting that in the diagram as well.

================
Comment at: lld/MachO/ExportTrie.cpp:55
+
+struct ExportInfo {
+  uint64_t address;
----------------
Nit: can this be part of the anonymous namespace above?

================
Comment at: lld/MachO/ExportTrie.cpp:56
+struct ExportInfo {
+  uint64_t address;
+};
----------------
Should add a TODO about adding proper support for the reexport and stub-and-resolver flags.

================
Comment at: lld/MachO/ExportTrie.cpp:71
+
+bool TrieNode::updateOffset(size_t &nextOffset) {
+  size_t nodeSize = 1; // Length when no Symbol info
----------------
I'd add a comment about what the return value signifies.

================
Comment at: lld/MachO/ExportTrie.cpp:72
+bool TrieNode::updateOffset(size_t &nextOffset) {
+  size_t nodeSize = 1; // Length when no Symbol info
+  if (info) {
----------------
This is the 1 byte for the TerminalSize member, right? Might be clearer to explain that in the comment.

================
Comment at: lld/MachO/ExportTrie.cpp:76
+    nodeSize = getULEB128Size(flags) + getULEB128Size(info->address);
+    // Overall node size so far is uleb128 of Symbol info + actual Symbol info.
+    nodeSize += getULEB128Size(nodeSize);
----------------
uleb128 of length of symbol info?

================
Comment at: lld/MachO/ExportTrie.cpp:99
+    uint32_t nodeSize = getULEB128Size(flags) + getULEB128Size(info->address);
+    assert(nodeSize < 256);
+    *buf++ = nodeSize;
----------------
Why this assert? The nodeSize (or TerminalSize as macho2yaml terms it) is a ULEB128, right?

================
Comment at: lld/MachO/ExportTrie.cpp:107
+  }
+  // Add number of children. TODO: Handle case where we have more than 256.
+  assert(edges.size() < 256);
----------------
You'd just have to split up the node in that case, right?

================
Comment at: lld/MachO/ExportTrie.cpp:137
+//
+// node:    The most recently created node along this path in the trie (i.e.
+//          the furthest from the root.)
----------------
Super nit: have the comments in the order of the parameters (or vice versa).

================
Comment at: lld/MachO/ExportTrie.cpp:152
+  // the pivot.
+  const Symbol *pivotSymbol = vec[0];
+  int pivot = charAt(pivotSymbol, pos);
----------------
Would it be preferable to use either the median or a random pivot? (I'm assuming the standard quicksort considerations for pivots apply, namely that always using the first element would trigger the quadratic worst case for an already sorted input.)

================
Comment at: lld/MachO/ExportTrie.cpp:184
+    // This is the tail-call-optimized version of the following:
+    // multikeySort(vec.slice(i, j - i), lastPos, pos + 1, node);
+    vec = vec.slice(i, j - i);
----------------
Did you mean sortAndBuild?

Is there an easy way to either express this as a loop or trust that the compiler will do the tail call optimization? I don't mind the goto very much, but I'm wondering how it compares to the alternatives.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76977/new/

https://reviews.llvm.org/D76977