[PATCH] D79668: [lld-macho] Support -order_file

Shoaib Meenai via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 15 14:10:05 PDT 2020


smeenai requested changes to this revision.
smeenai added a comment.
This revision now requires changes to proceed.

Looks great; it's a really clean implementation :) Just had a few comments and questions; requesting changes to put it back on your queue.



================
Comment at: lld/MachO/Config.h:31
   std::vector<llvm::StringRef> searchPaths;
+  llvm::StringMap<SymbolPriorityEntry> priorities;
+};
----------------
>From what I understand of https://llvm.org/docs/ProgrammersManual.html#dss-stringmap, StringMap copies key strings over into its own storage. That's great when you want it to take ownership of the key strings, but all our strings here should already be part of the global allocator pool. I think `DenseMap<StringRef, SymbolPriorityEntry>` would work better for us here.

In the process, I went down a bit of a rabbit hole on LLVM's hash tables and CachedHashStringRef, and now I'm confused about LLD's usage of the latter :D I'll probably post to llvm-dev.


================
Comment at: lld/MachO/Config.h:38
+// that the default-constructed zero priority -- for symbols/sections without a
+// user-defined order -- naturally end up putting them at the end of the output.
+struct SymbolPriorityEntry {
----------------
Grammar nit: "naturally end up" -> "naturally ends up"


================
Comment at: lld/MachO/Config.h:40
+struct SymbolPriorityEntry {
+  size_t anyObjectFile = 0;
+  llvm::StringMap<size_t> objectFiles;
----------------
It'd be helpful to have a comment explaining what these represent.


================
Comment at: lld/MachO/Driver.cpp:132
+static bool isArchString(StringRef s) {
+  static StringSet<> archNames{"ppc",    "ppc64", "i386",
+                               "x86_64", "arm",   "arm64"};
----------------
I'm assuming the names are the same ld64 uses? ("arm" in particular.)

Also, considering these are string constants and there's therefore no ownership concerns, a `DenseSet<StringRef>` might be better, for the same reasons as discussed above with the DenseMap? (Tbh, for this number of entries, just a linear search might be faster still, but that's definitely getting into unnecessary premature optimization territory :)).


================
Comment at: lld/MachO/Driver.cpp:168
+
+      if (isArchString(tok) && tok != "x86_64")
+        continue;
----------------
Add a TODO or other comment to serve as a reminder to change this when we add support for other architectures?


================
Comment at: lld/MachO/Driver.cpp:171
+
+      if (tok.endswith(".o"))
+        objectFile = tok;
----------------
It's valid (albeit unlikely) for a symbol name to end with `.o`. It's also valid (albeit unlikely) for a symbol to have the same name as an architecture.

Can we keep track of fields explicitly instead? For each line, build up its colon delimited fields. If you have more than three fields, it's an error. If you have three fields, the first one should be an architecture (which we should verify), the second one should be an object file (which we should verify), and the third one should be the symbol name. If you have only one field, it's the symbol name. And so on.


================
Comment at: lld/MachO/InputFiles.cpp:326
   auto file = make<ObjFile>(mb);
+  symbols.insert(symbols.end(), file->symbols.begin(), file->symbols.end());
   sections.insert(sections.end(), file->sections.begin(), file->sections.end());
----------------
Ah, I didn't realize we'd need to copy over the symbols too. Given that, making a global objFiles vector (or other similar solution to not have to do this copying) would be a good follow-up.


================
Comment at: lld/MachO/Writer.cpp:323
+
+  for (InputFile *file : inputFiles)
+    if (isa<ObjFile>(file) || isa<ArchiveFile>(file))
----------------
ELF iterates over the global symbols from the symbol table and only looks at the input files for local symbols. I'm not entirely sure why it's done that way ... perhaps it's so that for weak symbols, the ordering reflects the final symbol resolution?

As in, let's say foo.o defines `importantFunction` weakly and `bar.o` defines it strongly. With the input file search, we'd order both `foo.o` and `bar.o` for `importantFunction`. With the symbol table search, we'd only order `bar.o`.

We don't have weak symbols right now, so we can just add a TODO and punt on it for now. It'd be interesting to look at how ld64 handles such weak symbol ordering scenarios though.


================
Comment at: lld/test/MachO/order-file.s:4
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t/test.o
+# RUN: echo ".globl _foo, _bar; .text:; _foo: _bar: ret" | \
+# RUN:   llvm-mc -filetype=obj -triple=x86_64-apple-darwin -o %t/foo.o
----------------
You have a `:` after `.text`, so that's gonna create a label called `.text` instead of being the `.text` directive.


================
Comment at: lld/test/MachO/order-file.s:17
+# RUN: echo "_main # another comment" >> %t/ord1
+# RUN: lld -flavor darwinnew -o %t/test-1 %t/test.o %t/foo.o -order_file %t/ord1
+# RUN: llvm-objdump -d %t/test-1 | FileCheck %s --check-prefix=FOO-FIRST
----------------
For all these tests, can you pass the object files in both orders, to confirm that the order file does the right thing regardless of the order of object files on the command line?


================
Comment at: lld/test/MachO/order-file.s:31
+
+# RUN: lld -flavor darwinnew -o %t/test-archive %t/test.o %t/foo.a -order_file %t/ord3
+# RUN: llvm-objdump -d %t/test-archive | FileCheck %s --check-prefix=FOO-SECOND
----------------
Can we also test the FOO-FIRST case for archives?


================
Comment at: lld/test/MachO/order-file.s:33
+# RUN: llvm-objdump -d %t/test-archive | FileCheck %s --check-prefix=FOO-SECOND
+
+## The following tests check that if an address is matched by multiple order
----------------
Can you add a test case for the order file being something like

```
foo.o:_foo
_main
```

to make sure the object file matching works in the positive case as well? (We're only checking the negative case above, with bar.o.) It's indirectly tested by the ordering tests below, but it'd be nice to have an explicit test too.


================
Comment at: lld/test/MachO/order-file.s:61
+
+# RUN: echo "_bar" > %t/ord8
+# RUN: echo "_main" >> %t/ord8
----------------
Probably worth adding a comment about what's going on here ... it took me a bit to understand it.

With multiple symbols pointing to the same location, I think it's technically arbitrary which one `llvm-objdump` chooses to display, but it seems like it's just picking the first name, so that works for us.

Is the intention here that `_foo` and `_bar` are different symbols in the same section (and they'd be split into their own sections once we have subsections via symbols), or that they're aliases to the same symbol (and therefore ordering one should order the other)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79668/new/

https://reviews.llvm.org/D79668





More information about the llvm-commits mailing list