[PATCH] D76977: [lld-macho] Implement basic export trie

Tue Apr 28 21:03:17 PDT 2020

int3 marked 9 inline comments as done.
int3 added inline comments.

================
Comment at: lld/MachO/ExportTrie.cpp:15
+//
+//             +--+--+
+//             | _ba |
----------------
int3 wrote:
> smeenai wrote:
> > From what I understand of the trie structure, the strings for children are actually stored in the parent. This is interesting by itself, and it also implies that the root node is always the empty string. I think it's worth making a note of that and reflecting that in the diagram as well.
> good catch, will fix
I feel like the root node corresponding to the entry string is pretty standard for tries though. Also I think it makes more sense to say the strings are stored on the edges rather than the parent. I'll update the comment + diagram to reflect that; hopefully it's clearer from the new diagram that the root node corresponds to the empty string

================
Comment at: lld/MachO/ExportTrie.cpp:99
+    uint32_t nodeSize = getULEB128Size(flags) + getULEB128Size(info->address);
+    assert(nodeSize < 256);
+    *buf++ = nodeSize;
----------------
int3 wrote:
> int3 wrote:
> > smeenai wrote:
> > > Why this assert? The nodeSize (or TerminalSize as macho2yaml terms it) is a ULEB128, right?
> > nope, it's a byte. Only the offsets are ulebs
> nvm, I'm wrong, oops. It's a uleb indeed
I realized that the usage of `nodeSize` is inconsistent and confusing. In `updateOffset()` it includes the terminal size and the edges, but in `writeTo()` it doesn't. I'll rename `nodeSize` to `terminalSize` in `writeTo()`.

================
Comment at: lld/MachO/ExportTrie.cpp:184
+    // This is the tail-call-optimized version of the following:
+    // multikeySort(vec.slice(i, j - i), lastPos, pos + 1, node);
+    vec = vec.slice(i, j - i);
----------------
int3 wrote:
> smeenai wrote:
> > Did you mean sortAndBuild?
> > 
> > Is there an easy way to either express this as a loop or trust that the compiler will do the tail call optimization? I don't mind the goto very much, but I'm wondering how it compares to the alternatives.
> Oops, this was copypasta from llvm-mc's original implementation (they called it multikeySort there)
> 
> The TCO was also from that implementation, but I think it's a "standard" quicksort optimization too, at least for the normal two-way quicksort. Wikipedia says that it's "suggested by Sedgwick and widely used in practice": https://en.wikipedia.org/wiki/Quicksort#Optimizations
It seems like there's no way to force the compiler to do TCO. There are only attributes to tell clang *not* to do TCO.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76977/new/

https://reviews.llvm.org/D76977