[PATCH] D89257: [lld-macho] Emit STABS symbols for debugging, and drop debug sections

Greg Clayton via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 9 23:07:26 PST 2020


clayborg requested changes to this revision.
clayborg added inline comments.
This revision now requires changes to proceed.


================
Comment at: lld/MachO/SyntheticSections.cpp:610
+  stab.strx = stringTableSection.addString(saver.save(path.str()));
+  stab.desc = 1;
+  stabs.emplace_back(std::move(stab));
----------------
stab.value should be set to the modification date of the .o file here. If this is a path to a .o file, then the mod time of the .o file, and if this is a "foo.a(bar.o)", it needs to be the modification time of the .o file as mentioned in the BSD archive file data structures. This is important because sometimes .a file have multiple .o files with the same name and the only way to tell them apart is the mod time.


================
Comment at: lld/MachO/SyntheticSections.cpp:617
+    StabsEntry stab(MachO::N_FUN);
+    stab.sect = 1;
+    stab.strx = stringTableSection.addString(defined->getName());
----------------
"stab.sect" needs to be set to be the 1 based mach-o section index that contains the "stab.value" address within the mach-o file. Any symbol that you emit that has an address in "stab.sect" has its "stab.sect" set to the 1 based section index. So you take all of the sections from all LC_SEGMENT or LC_SEGMENT_64 load commands and make a list of them and the first one will have index 1. From my python tool I have something that dumps the sections and shows the index:

```
$ mach_o.py a.out --sections
FILE OFF    INDEX ADDRESS            SIZE               OFFSET     ALIGN      RELOFF     NRELOC     FLAGS      RESERVED1  RESERVED2  RESERVED3  NAME
=========== ===== ------------------ ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------------------
0x000000b0: [  1] 0x0000000100000ec0 0x00000000000000c1 0x00000ec0 0x00000004 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__text
0x00000100: [  2] 0x0000000100000f82 0x0000000000000006 0x00000f82 0x00000001 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000 __TEXT.__stubs
0x00000150: [  3] 0x0000000100000f88 0x000000000000001a 0x00000f88 0x00000002 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__stub_helper
0x000001a0: [  4] 0x0000000100000fa2 0x000000000000000d 0x00000fa2 0x00000000 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000 __TEXT.__cstring
0x000001f0: [  5] 0x0000000100000fb0 0x0000000000000048 0x00000fb0 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__unwind_info
0x00000288: [  6] 0x0000000100001000 0x0000000000000008 0x00001000 0x00000003 0x00000000 0x00000000 0x00000006 0x00000001 0x00000000 0x00000000 __DATA_CONST.__got
0x00000320: [  7] 0x0000000100002000 0x0000000000000008 0x00002000 0x00000003 0x00000000 0x00000000 0x00000007 0x00000002 0x00000000 0x00000000 __DATA.__la_symbol_ptr
0x00000370: [  8] 0x0000000100002008 0x0000000000000008 0x00002008 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__data
```
LLDB uses this section index to grab the right section when it parses the symbol.


================
Comment at: lld/MachO/SyntheticSections.cpp:618
+    stab.sect = 1;
+    stab.strx = stringTableSection.addString(defined->getName());
+    stab.value = defined->getVA();
----------------
smeenai wrote:
> I guess we can optimize this later to not duplicate strings in the string table? (Since the symbol table entry will also add the symbol name.)
Yes, this should be uniqued if it isn't happening already. Large C++ compiles will really trigger this and cause huge symbol table bloat.


================
Comment at: lld/MachO/SyntheticSections.cpp:635
   for (Symbol *sym : symtab->getSymbols()) {
+    // TODO support other symbol types
     if (isa<Defined>(sym) || sym->isInGot() || sym->isInStubs()) {
----------------
Yes, we need to emit N_GSYM symbols for globals, and N_STSYM symbols for static variables. Those need to be linked in the debug info.

N_STSYM always has the address set to a valid address. The stab entry should have the "value" set to the address, and "sect" set to the right section index just like the N_FUN entry.

N_GSYM may or may not have the address filed in. If the address of the global can be calculated, the "value" should be set to the address, and "sect" set to the right section index just like the N_FUN entry. If the address can't be calculated, N_GSYM can be zero and we will need to find the global by matching up with the non STAB symbol whose name must match exactly (and that symbol will have "value" and "sect" set correctly).




================
Comment at: lld/MachO/SyntheticSections.cpp:649-650
+      InputSection *isec = defined->isec;
+      // XXX is it right to assume that all symbols in __text are function
+      // symbols?
+      if (isec->name == "__text") {
----------------
smeenai wrote:
> I think this is valid. How does ld64 make this determination though?
Not necessarily. Read only globals can be in __text IIRC on ARM and possibly ARM64 or even other architectures.


================
Comment at: lld/MachO/SyntheticSections.cpp:653
+        ObjFile *file = dyn_cast<ObjFile>(isec->file);
+        assert(file);
+        if (!file->compileUnit)
----------------
Is this truly unreachable code? 


================
Comment at: lld/MachO/SyntheticSections.cpp:657-659
+        if (lastFile == nullptr || lastFile != file) {
+          if (lastFile != nullptr)
+            emitEndSourceStab();
----------------
Is this list sorted by file? We don't want to be emitting multiple N_SO + N_OSO entries for the same source file. dsymutil will be a lot less efficient if we have a few entries for "/tmp/foo.cpp" and "/tmp/foo.o" and then a few function, then some other source + object file, and then "/tmp/foo.cpp" and "/tmp/foo.o" again.


================
Comment at: lld/MachO/SyntheticSections.cpp:697-698
+
+  // Emit the stabs entries after the "real" symbols. We cannot emit them
+  // before as that would render Symbol::symtabIndex inaccurate.
+  for (const StabsEntry &entry : stabs) {
----------------
So in mach-o files, and preparing for lld emitting a LC_DYSYMTAB load command, all local symbols must be in a contiguous block of symbols. In fact all locals, external, and undefined symbols must be in a contiguous block. That is required because if "ilocalsym" represents the first local symbol table index of all of the local symbols and "nlocalsym" is the out. Same for external with "iextdefsym" and "nextdefsym", and for undefined symbols with "iundefsym" and "nundefsym". 

```
$ otool -lv a.out
...
Load command 7
            cmd LC_DYSYMTAB
        cmdsize 80
      ilocalsym 0
      nlocalsym 17
     iextdefsym 17
     nextdefsym 4
      iundefsym 21
      nundefsym 2
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89257/new/

https://reviews.llvm.org/D89257



More information about the llvm-commits mailing list