[lld] [LLD] Implement --enable-non-contiguous-regions (PR #90007)

Tue Apr 30 14:09:14 PDT 2024

================
@@ -1364,6 +1432,107 @@ const Defined *LinkerScript::assignAddresses() {
   return getChangedSymbolAssignment(oldValues);
 }
 
+static bool isRegionOverflowed(MemoryRegion *mr) {
+  if (!mr)
+    return false;
+  return mr->curPos - mr->getOrigin() > mr->getLength();
+}
+
+// Spill input sections in reverse order of address assignment to (potentially)
+// bring memory regions out of overflow. The size savings of a spill can only be
+// estimated, since general linker script arithmetic may occur afterwards.
+// Under-estimates may cause unnecessary spills, but over-estimates can always
+// be corrected on the next pass.
+bool LinkerScript::spillSections() {
+  if (!config->enableNonContiguousRegions)
+    return false;
+
+  bool spilled = false;
+  for (SectionCommand *cmd : reverse(sectionCommands)) {
+    auto *od = dyn_cast<OutputDesc>(cmd);
+    if (!od)
+      continue;
+    OutputSection *osec = &od->osec;
+    if (!osec->size || !osec->memRegion)
+      continue;
+
+    DenseSet<InputSection *> spills;
+    for (SectionCommand *cmd : reverse(osec->commands)) {
+      if (!isRegionOverflowed(osec->memRegion) &&
+          !isRegionOverflowed(osec->lmaRegion))
+        break;
+
+      auto *is = dyn_cast<InputSectionDescription>(cmd);
+      if (!is)
+        continue;
+      for (InputSection *isec : reverse(is->sections)) {
+        // Potential spill locations cannot be spilled.
+        if (isa<SpillInputSection>(isec))
+          continue;
+
+        // Find the next spill location.
+        auto it = spillLists.find(isec);
+        if (it == spillLists.end())
+          continue;
+
+        spilled = true;
+        SpillList &list = it->second;
+
+        SpillInputSection *spill = list.head;
+        if (!spill->next)
+          spillLists.erase(isec);
+        else
+          list.head = spill->next;
+
+        spills.insert(isec);
+
+        // Replace the next spill location with the spilled section and adjust
+        // its properties to match the new location.
----------------
mysterymath wrote:

This was one of the semantic caveats taken to keep the model of how this works simple. The spill input sections work throughout the whole link as if there were additional copies of the referenced input section in those output sections. This includes calculations for the flags of the output sections, program header generation, etc. The one caveat is they're not included in the list of all input sections, so they're not in scope for ICF, GC, etc.
 
It's not until the final address assignment in fixed point that the copies drop out. Afterwards, the input sections are seen in their final positions, and the spills have been removed, albeit leaving some of their effects behind.

This isn't as simple as possible; the simplest semantics would be for the whole link to behave "as if" the sections had always been matched to their final positions. However, due to circular dependencies in the link, this would require either running huge swaths of the linker in a loop to fixed point or being able to undo and redo chains of decision-making between output section matching and assignment. The former seems broadly like a non-starter, but the latter we may be able to do piecemeal if the semantics above don't actually end up being sufficient.

https://github.com/llvm/llvm-project/pull/90007