[PATCH] D98837: [lld][MachO] Refactor handling of subsections

Mon Mar 22 12:21:23 PDT 2021

int3 added a comment.

> It looks like the new version is faster than the current master (release builds) (measured on chromium)

Neat!

================
Comment at: lld/MachO/InputFiles.cpp:417-419
   // resize(), not reserve(), because we are going to create N_ALT_ENTRY symbols
   // out-of-sequence.
   symbols.resize(nList.size());
----------------
I guess this can be `reserve` now, and the loop on line 452 turned into a for-range loop

================
Comment at: lld/MachO/InputFiles.cpp:421
+
+  for (structs::nlist_64 sym : nList) {
+    if (((sym.n_type & N_TYPE) == N_SECT) && !(sym.n_desc & N_ALT_ENTRY) &&
----------------

================
Comment at: lld/MachO/InputFiles.cpp:421-427
+  for (structs::nlist_64 sym : nList) {
+    if (((sym.n_type & N_TYPE) == N_SECT) && !(sym.n_desc & N_ALT_ENTRY) &&
+        !subsections[sym.n_sect - 1].empty())
+      subsections[sym.n_sect - 1].push_back(
+          {sym.n_value - sectionHeaders[sym.n_sect - 1].addr,
+           subsections[sym.n_sect - 1].front().second});
+  }
----------------
int3 wrote:
> 
would be nice to have a comment about what this loop is doing, I have to stare at it for a while

================
Comment at: lld/MachO/InputFiles.cpp:422
+  for (structs::nlist_64 sym : nList) {
+    if (((sym.n_type & N_TYPE) == N_SECT) && !(sym.n_desc & N_ALT_ENTRY) &&
+        !subsections[sym.n_sect - 1].empty())
----------------

================
Comment at: lld/MachO/InputFiles.cpp:423-426
+        !subsections[sym.n_sect - 1].empty())
+      subsections[sym.n_sect - 1].push_back(
+          {sym.n_value - sectionHeaders[sym.n_sect - 1].addr,
+           subsections[sym.n_sect - 1].front().second});
----------------
factor out `subsections[sym.n_sect - 1]`?

================
Comment at: lld/MachO/InputFiles.cpp:434-435
+    subsectionMap.erase(
+        std::unique(std::begin(subsectionMap), std::end(subsectionMap)),
+        std::end(subsectionMap));
+    if (!subsectionsViaSymbols)
----------------

================
Comment at: lld/MachO/InputFiles.cpp:440
+      uint32_t offset = subsectionMap[i].first;
+      InputSection *&sec = subsectionMap[i].second;
+      uint32_t end = i + 1 < subsectionMap.size() ? subsectionMap[i + 1].first
----------------
convention is to name all generic InputSections `isec`

================
Comment at: lld/MachO/InputFiles.cpp:466-467
-
-    // We saw a symbol definition at a new offset. Split the section into two
-    // subsections. The new symbol uses the second subsection.
-    auto *secondIsec = make<InputSection>(*firstIsec);
----------------
comments like these seem worth keeping

================
Comment at: lld/MachO/InputFiles.cpp:468

-    uint64_t offset = sym.n_value - sec.addr;
+    uint32_t offset = sym.n_value - sec.addr;

----------------
I think this should still be uint64_t?

================
Comment at: lld/MachO/InputFiles.cpp:478

-    // nList entries aren't necessarily arranged in address order. Therefore,
-    // we can't create alt-entry symbols at this point because a later symbol
-    // may split its section, which may affect which subsection the alt-entry
-    // symbol is assigned to. So we need to handle them in a second pass below.
-    if (sym.n_desc & N_ALT_ENTRY) {
-      altEntrySymIdxs.push_back(i);
-      continue;
-    }
-
-    // Find the subsection corresponding to the greatest section offset that is
-    // <= that of the current symbol. The subsection that we find either needs
-    // to be used directly or split in two.
-    uint32_t firstSize = offset;
-    InputSection *firstIsec = findContainingSubsection(subsecMap, &firstSize);
-
-    if (firstSize == 0) {
-      // Alias of an existing symbol, or the first symbol in the section. These
-      // are handled by reusing the existing section.
-      symbols[i] = createDefined(sym, name, firstIsec, 0);
-      continue;
-    }
-
-    // We saw a symbol definition at a new offset. Split the section into two
-    // subsections. The new symbol uses the second subsection.
-    auto *secondIsec = make<InputSection>(*firstIsec);
-    secondIsec->data = firstIsec->data.slice(firstSize);
-    firstIsec->data = firstIsec->data.slice(0, firstSize);
-    // TODO: ld64 appears to preserve the original alignment as well as each
-    // subsection's offset from the last aligned address. We should consider
-    // emulating that behavior.
-    secondIsec->align = MinAlign(firstIsec->align, offset);
-
-    subsecMap[offset] = secondIsec;
-    // By construction, the symbol will be at offset zero in the new section.
-    symbols[i] = createDefined(sym, name, secondIsec, 0);
+    InputSection *subsec = findContainingSubsection(subsecMap, &offset);
+    symbols[i] = createDefined(sym, name, subsec, offset);
----------------
just an idea: If the loop on line 421 could store the symbols associated with each subsection (instead of just the address), then we wouldn't have to do this binary search here

(but given that the current implementation is already faster than the old one, this is fine for now)

================
Comment at: lld/MachO/InputFiles.cpp:482-485
+  if (!subsectionsViaSymbols)
+    for (SubsectionMap &subsectionMap : subsections)
+      if (!subsectionMap.empty())
+        subsectionMap = {subsectionMap.front()};
----------------
instead of truncating the vector here, can we just skip the loop on line 421 if `subsectionsViaSymbols` is false? I think the loop on line 429 could be skipped too actually

================
Comment at: lld/MachO/InputFiles.h:52
 // each subsection from the start of the original pre-split InputSection.
-using SubsectionMap = std::map<uint32_t, InputSection *>;
+using SubsectionMap = std::vector<std::pair<uint32_t, InputSection *>>;

----------------
nit: can we use a struct instead of a pair? I think `first` and `second` aren't very readable

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98837/new/

https://reviews.llvm.org/D98837