[PATCH] D78342: [lld] Add archive file support to Mach-O backend

Shoaib Meenai via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 6 00:30:11 PDT 2020


smeenai added a subscriber: pcc.
smeenai added a comment.

It's worth noting in the commit message that we're restoring some of this functionality from @pcc and @ruiu's initial prototype.



================
Comment at: lld/MachO/InputFiles.cpp:248-255
+std::vector<InputSection *> const ArchiveFile::getInputSections() {
+  std::vector<InputSection *> sections;
+  for (InputFile *input : resolvedInputs) {
+    auto inputSections = input->getInputSections();
+    sections.insert(sections.end(), inputSections.begin(), inputSections.end());
+  }
+  return sections;
----------------
Ktwu wrote:
> int3 wrote:
> > Kind of unfortunate that we have to return by value here, but I see that lld-ELF's `getInputSections` does something similar.
> > 
> > Also, did you mean to type this method as `std::vector<InputSection *> ArchiveFile::getInputSections() const {`? I *think* the current placement of the `const` is saying that the return value is `const` instead of saying the method is const...
> re: const, yup, thanks!
The ELF equivalent is `InputFiles::getSections`, and it returns an `ArrayRef`.

Returning by value wouldn't be an issue here – we're returning a local object, so return value optimization ([NRVO](https://en.cppreference.com/w/cpp/language/copy_elision)) kicks in. It's not ideal for `ObjFile::getInputSections` though, as I commented above.


================
Comment at: lld/MachO/InputFiles.h:59
   static bool classof(const InputFile *f) { return f->kind() == ObjKind; }
+  std::vector<InputSection *> getInputSections() const { return sections; }
+
----------------
Nit: add `override`

I'm *really* wary of this returning by value ... that seem like a bunch of unnecessary copying. The ELF equivalent to this function is `InputFile::getSections` (which isn't overridden); it returns an `ArrayRef`, and it asserts that it's only called on object or binary files (and we don't have to worry about binary files). ELF makes that work by having a global `objectFiles` vector (instead of just an `inputFiles` vector), so when it's doing the loop over all input sections, it only loops over object files.

Whether we wanna have a specific `objectFiles` global vector as well or just stick with `inputFiles` is a broader design question, but we should at least figure out a good way to make this not return by value.


================
Comment at: lld/MachO/InputFiles.h:70
   static bool classof(const InputFile *f) { return f->kind() == DylibKind; }
+  std::vector<InputSection *> getInputSections() const { return sections; }
 
----------------
`sections` should always be empty for a DylibFile, so this is confusing. Perhaps just construct and return an empty vector here? (Although that wouldn't work with the ArrayRef comment above, but in that case a comment would be good.)


================
Comment at: lld/MachO/SymbolTable.cpp:59
     replaceSymbol<Undefined>(s, name);
   return s;
 }
----------------
We need to add a check here where if the existing symbol `isa<LazySymbol>`, we fetch that archive member. A consequence is that we need to store a pointer to the `ArchiveFile` in the `LazySymbol`.

We should add a test for this case, where the archive file appears on the link line before the object file, and its members should still be fetched.


================
Comment at: lld/MachO/SymbolTable.cpp:77
+  std::tie(s, wasInserted) = insert(name);
+
+  if (wasInserted)
----------------
int3 wrote:
> Ktwu wrote:
> > MaskRay wrote:
> > > How does an archive symbol interact with a DylibSymbol?
> > By "interact" do you mean if there's a conflict -- a dylib and an archive both expose a symbol with the same name? I assume there shouldn't be issues if symbol names are unique...
> > 
> > @int3 you know more about dylibs; do you see potential issues?
> We definitely need to account for symbol name collisions. I just commented on {D79114}, but basically I've just been feeding various inputs to ld64 and emulating its output, though we should probably figure out the details of its symbol table implementation sometime...
ld64 uses the same archive file semantics as LLD ELF and COFF, where an archive member can be used to resolve undefined symbol references regardless of its position on the command line.

A defined symbol in an object file always wins out over a dylib symbol. An archive member that got included in the link is treated as a regular object file for this purpose. If the dylib symbol is seen first, an archive member defining the same symbol won't be pulled out (unless some other symbol from that member is referenced).

I believe that's the behavior we're implementing here. A LazySymbol won't replace a DylibSymbol, but if the member gets fetched, all its symbols will be added as Defined, which will override DylibSymbols.

Here's an example to clarify what I mean:

```
$ cat f1.c
int f() { return 1; }

$ cat f2.c
int f() { return 2; }

$ cat g.c
int f(void);
int g() { return f(); }

$ cat f2g.c
int f() { return 2; }
int g() { return f(); }

$ cat main.c
int g(void);
int main() { return g(); }

$ clang -c f1.c f2.c g.c f2g.c main.c
$ ld -dylib -o libf1.dylib f1.o
$ ar crs libf2_g.a f2.o g.o
$ ar crs libf2g.a f2g.o

$ ld libf1.dylib libf2_g.a main.o -lSystem
$ ./a.out; echo $?
1 # because libf1.dylib was first in the link line

$ ld libf2_g.a libf1.dylib main.o -lSystem
$ ./a.out; echo $?
2 # because libf2_g.a was first on the link line

$ ld libf1.dylib libf2g.a main.o -lSystem
$ ./a.out; eho $?
2 # libf2g.a(f2g.o) was included in the link cos of the reference to g, so the f from there was used
```


================
Comment at: lld/MachO/SymbolTable.h:34
+  Symbol *addLazy(StringRef name, ArchiveFile *file,
+                  const llvm::object::Archive::Symbol &sym);
+
----------------
Nit: explicitly include `Archive.h`


================
Comment at: lld/test/MachO/Inputs/archive4.s:1
+.global _lonely
+_alone:
----------------
Directive doesn't match symbol name below


================
Comment at: lld/test/MachO/archive-no-index.s:2
+# REQUIRES: x86
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %S/Inputs/archive2.s -o %t2
----------------
Super nit: use `%t.o`, etc. for object file names, to make it clear they're object files. Extension-less names are usually executables, so it's easier to understand tests if we have the explicit extension.


================
Comment at: lld/test/MachO/archive.s:15
+# CHECK-NEXT: T _boo
+# CHECK-NEXT: T _main
+
----------------
We'll wanna add a `CHECK-NOT` for `_lonely`/`_alone` (depending on what you decide to call that symbol), to make sure that archive member wasn't pulled in.

Unfortunately, that depends on the order symbols are printed out by nm, so it's not completely reliable. As in, if we did have `_alone` in the symbol table (because we incorrectly pulled in that archive member), and nm printed it out before `_bar`, having a `CHECK-NOT: T _alone` after this line wouldn't do us any good. Having a second `CHECK-NOT` before the `CHECK` for `_bar` should be enough to make this work, I think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78342/new/

https://reviews.llvm.org/D78342





More information about the llvm-commits mailing list