[lld] 0557b1b - [ELF] Resolve defined symbols before undefined symbols

Fangrui Song via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 11 09:41:52 PST 2021


Author: Fangrui Song
Date: 2021-02-11T09:41:46-08:00
New Revision: 0557b1bdec6e5e7228292f390a49829efcacd4da

URL: https://github.com/llvm/llvm-project/commit/0557b1bdec6e5e7228292f390a49829efcacd4da
DIFF: https://github.com/llvm/llvm-project/commit/0557b1bdec6e5e7228292f390a49829efcacd4da.diff

LOG: [ELF] Resolve defined symbols before undefined symbols

When parsing an object file, LLD interleaves undefined symbol resolution (which
may recursively fetch other lazy objects) with defined symbol resolution.

This may lead to surprising results, e.g. if an object file defines currently
undefined symbols and references another lazy symbol, we may interleave defined
symbols with the lazy fetch, potentially leading to the defined symbols
resolving to different files.

As an example, if both `a.a(a.o)` and `a.a(b.o)` define `foo` (not in COMDAT
group, or in different COMDAT groups) and `__profd_foo` (in COMDAT group
`__profd_foo`).  LLD may resolve `foo` to `a.a(a.o)` and `__profd_foo` to
`b.a(b.o)`, i.e. different files.

```
parse ArchiveFile a.a
  entry fetches a.a(a.o)
  parse ObjectFile a.o
    define entry
    define foo
    reference b
    b fetches a.a(b.o)
    parse ObjectFile b.o
      define prevailing __profd_foo
    define (ignored) non-prevailing __profd_foo
```

Assuming a set of interconnected symbols are defined all or none in several lazy
objects. Arguably making them resolve to the same file is preferable than making
them resolve to different files (some are lazy objects).

The main argument favoring the new behavior is the stability. The relative order
between a defined symbol and an undefined symbol does not change the symbol
resolution behavior.  Only the relative order between two undefined symbols can
affect fetching behaviors.

---

The real world case is reduced from a Fuchsia PGO usage: `a.a(a.o)` has a
constructor within COMDAT group C5 while `a.a(b.o)` has a constructor within
COMDAT group C2. Because they use different group signatures, they are not
de-duplicated. It is not entirely whether Clang behavior is entirely conforming.

LLD selects the PGO counter section (`__profd_*`) from `a.a(b.o)` and the
constructor section from `a.a(a.o)`. The `__profd_*` is a SHF_LINK_ORDER section
linking to its own non-prevailing constructor section, so LLD errors
`sh_link points to discarded section`. This patch fixes the error.

Differential Revision: https://reviews.llvm.org/D95985

Added: 
    lld/test/ELF/interconnected-lazy.s

Modified: 
    lld/ELF/InputFiles.cpp

Removed: 
    


################################################################################
diff  --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index caca4bc37f4e..aac7c5c42d4a 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -1138,6 +1138,7 @@ template <class ELFT> void ObjFile<ELFT>::initializeSymbols() {
   }
 
   // Symbol resolution of non-local symbols.
+  SmallVector<unsigned, 32> unds;
   for (size_t i = firstGlobal, end = eSyms.size(); i != end; ++i) {
     const Elf_Sym &eSym = eSyms[i];
     uint8_t binding = eSym.getBinding();
@@ -1154,8 +1155,7 @@ template <class ELFT> void ObjFile<ELFT>::initializeSymbols() {
 
     // Handle global undefined symbols.
     if (eSym.st_shndx == SHN_UNDEF) {
-      this->symbols[i]->resolve(Undefined{this, name, binding, stOther, type});
-      this->symbols[i]->referenced = true;
+      unds.push_back(i);
       continue;
     }
 
@@ -1202,6 +1202,20 @@ template <class ELFT> void ObjFile<ELFT>::initializeSymbols() {
 
     fatal(toString(this) + ": unexpected binding: " + Twine((int)binding));
   }
+
+  // Undefined symbols (excluding those defined relative to non-prevailing
+  // sections) can trigger recursive fetch. Process defined symbols first so
+  // that the relative order between a defined symbol and an undefined symbol
+  // does not change the symbol resolution behavior. In addition, a set of
+  // interconnected symbols will all be resolved to the same file, instead of
+  // being resolved to 
diff erent files.
+  for (unsigned i : unds) {
+    const Elf_Sym &eSym = eSyms[i];
+    StringRefZ name = this->stringTable.data() + eSym.st_name;
+    this->symbols[i]->resolve(Undefined{this, name, eSym.getBinding(),
+                                        eSym.st_other, eSym.getType()});
+    this->symbols[i]->referenced = true;
+  }
 }
 
 ArchiveFile::ArchiveFile(std::unique_ptr<Archive> &&file)

diff  --git a/lld/test/ELF/interconnected-lazy.s b/lld/test/ELF/interconnected-lazy.s
new file mode 100644
index 000000000000..1a8d35a6bec6
--- /dev/null
+++ b/lld/test/ELF/interconnected-lazy.s
@@ -0,0 +1,42 @@
+# REQUIRES: x86
+
+# RUN: split-file %s %t
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %t/main.s -o %t/main.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %t/a.s -o %t/a.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64 %t/b.s -o %t/b.o
+
+## foo and __foo are interconnected and defined in two lazy object files.
+## Test we resolve both to the same file.
+# RUN: ld.lld -y a -y foo -y __foo %t/main.o --start-lib %t/a.o %t/b.o --end-lib -o /dev/null | FileCheck %s
+
+# CHECK:      a.o: lazy definition of __foo
+# CHECK-NEXT: a.o: lazy definition of a
+# CHECK-NEXT: a.o: lazy definition of foo
+# CHECK-NEXT: b.o: definition of __foo
+# CHECK-NEXT: b.o: definition of foo
+# CHECK-NEXT: b.o: reference to a
+# CHECK-NEXT: a.o: definition of a
+
+#--- main.s
+.globl _start
+_start:
+  call b
+
+#--- a.s
+.globl a
+.weak foo
+a:
+foo:
+
+.weak __foo
+__foo:
+
+#--- b.s
+.globl b
+.weak foo
+b:
+  call a
+foo:
+
+.weak __foo
+__foo:


        


More information about the llvm-commits mailing list