[PATCH] D83103: [lld-macho] Support binding dysyms to any section

Thu Jul 2 17:18:11 PDT 2020

smeenai added a comment.

The approach looks good to me. Once we're able to e.g. self-host and link clang, we can get more accurate numbers and confirm that the overhead of storing all these bindings won't be too high.

================
Comment at: lld/MachO/SyntheticSections.cpp:105
+struct Binding {
+  OutputSegment *segment{nullptr};
+  uint64_t offset{0};
----------------
Super nit: can we use `=` initialization instead, for consistency with the rest of the code? (At least, I think `=` initialization is standard in LLD.)

================
Comment at: lld/MachO/SyntheticSections.cpp:112
+
+// Encode a sequence of opcodes that dyld to write the address of dysym +
+// addend at osec->addr + outSecOff.
----------------
"that dyld to write" -> "that **tell** dyld to write"?

================
Comment at: lld/MachO/SyntheticSections.cpp:131-132
+  } else if (lastBinding.offset != offset) {
+    if (lastBinding.offset > offset)
+      llvm_unreachable("relocations should be sorted");
+    os << static_cast<uint8_t>(BIND_OPCODE_ADD_ADDR_ULEB);
----------------
Nit: an assert seems more conventional

================
Comment at: lld/MachO/SyntheticSections.cpp:177
   raw_svector_ostream os{contents};
-  os << static_cast<uint8_t>(MachO::BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB |
-                             in.got->parent->index);
-  encodeULEB128(in.got->getSegmentOffset(), os);
-  uint32_t entries_to_skip = 0;
-  for (const Symbol *sym : in.got->getEntries()) {
-    if (const auto *dysym = dyn_cast<DylibSymbol>(sym)) {
-      if (entries_to_skip != 0) {
-        os << static_cast<uint8_t>(MachO::BIND_OPCODE_ADD_ADDR_ULEB);
-        encodeULEB128(WordSize * entries_to_skip, os);
-        entries_to_skip = 0;
-      }
-
-      // TODO: Implement compact encoding -- we only need to encode the
-      // differences between consecutive symbol entries.
-      if (dysym->file->ordinal <= MachO::BIND_IMMEDIATE_MASK) {
-        os << static_cast<uint8_t>(MachO::BIND_OPCODE_SET_DYLIB_ORDINAL_IMM |
-                                   dysym->file->ordinal);
-      } else {
-        error("TODO: Support larger dylib symbol ordinals");
-        continue;
-      }
-      os << static_cast<uint8_t>(
-                MachO::BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM)
-         << dysym->getName() << '\0'
-         << static_cast<uint8_t>(MachO::BIND_OPCODE_SET_TYPE_IMM |
-                                 MachO::BIND_TYPE_POINTER)
-         << static_cast<uint8_t>(MachO::BIND_OPCODE_DO_BIND);
-    } else {
-      // We have a defined symbol with a pre-populated address; skip over it.
-      ++entries_to_skip;
+  Binding lastBinding;
+  bool did_encode{false};
----------------
We'll initialize e.g. addend to 0 here, and for the GOT entries, the addend is always 0, so the `lastBinding.addend != addend` condition will never be taken and we won't emit an initial `BIND_OPCODE_SET_ADDEND_SLEB`. Will dyld assume 0 in that case, or should we initialize it explicitly?

Seems like we were never setting the addend at all before, so maybe assuming dyld will default to 0 is okay? Should probably double-check though.

================
Comment at: lld/MachO/SyntheticSections.cpp:178
+  Binding lastBinding;
+  bool did_encode{false};
+  SetVector<const Symbol *> entries = in.got->getEntries();
----------------
Same nit about `=` initialization.

================
Comment at: lld/MachO/SyntheticSections.cpp:179
+  bool did_encode{false};
+  SetVector<const Symbol *> entries = in.got->getEntries();
+  for (size_t i = 0, n = entries.size(); i < n; ++i) {
----------------
Are we creating a copy of all the entries here?

================
Comment at: lld/MachO/SyntheticSections.cpp:180
+  SetVector<const Symbol *> entries = in.got->getEntries();
+  for (size_t i = 0, n = entries.size(); i < n; ++i) {
+    if (const auto *dysym = dyn_cast<DylibSymbol>(entries[i])) {
----------------
Is the switch from a range-based for loop only so we have `i`? If so, I think I'd prefer to just keep a separate `offset` variable and increment it in a range-based for loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83103/new/

https://reviews.llvm.org/D83103