[PATCH] D89257: [lld-macho] Emit STABS symbols for debugging, and drop debug sections

Jez Ng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 10 16:49:20 PST 2020


int3 marked 3 inline comments as done.
int3 added a comment.

@smeenai Dwarf parsing appears to be on-demand, though I don't know if it's parsing just the subset of things we end up using. Specifically, `compile_units()` calls `parseNormalUnits`, which parses all units of type `DW_SECT_INFO` and `DW_SECT_EXT_TYPES`. I'll add a TODO to investigate more later.

@clayborg thanks for all the insightful comments! I'll work the modtime and n_sect follow-up diffs tomorrow so that they can be reviewed as a coherent whole.



================
Comment at: lld/MachO/SyntheticSections.cpp:610
+  stab.strx = stringTableSection.addString(saver.save(path.str()));
+  stab.desc = 1;
+  stabs.emplace_back(std::move(stab));
----------------
clayborg wrote:
> stab.value should be set to the modification date of the .o file here. If this is a path to a .o file, then the mod time of the .o file, and if this is a "foo.a(bar.o)", it needs to be the modification time of the .o file as mentioned in the BSD archive file data structures. This is important because sometimes .a file have multiple .o files with the same name and the only way to tell them apart is the mod time.
yeah I was going to do it in a follow-up. Will add a TODO here. Didn't think about the archive case though, thanks for the tip!


================
Comment at: lld/MachO/SyntheticSections.cpp:617
+    StabsEntry stab(MachO::N_FUN);
+    stab.sect = 1;
+    stab.strx = stringTableSection.addString(defined->getName());
----------------
clayborg wrote:
> "stab.sect" needs to be set to be the 1 based mach-o section index that contains the "stab.value" address within the mach-o file. Any symbol that you emit that has an address in "stab.sect" has its "stab.sect" set to the 1 based section index. So you take all of the sections from all LC_SEGMENT or LC_SEGMENT_64 load commands and make a list of them and the first one will have index 1. From my python tool I have something that dumps the sections and shows the index:
> 
> ```
> $ mach_o.py a.out --sections
> FILE OFF    INDEX ADDRESS            SIZE               OFFSET     ALIGN      RELOFF     NRELOC     FLAGS      RESERVED1  RESERVED2  RESERVED3  NAME
> =========== ===== ------------------ ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------------------
> 0x000000b0: [  1] 0x0000000100000ec0 0x00000000000000c1 0x00000ec0 0x00000004 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__text
> 0x00000100: [  2] 0x0000000100000f82 0x0000000000000006 0x00000f82 0x00000001 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000 __TEXT.__stubs
> 0x00000150: [  3] 0x0000000100000f88 0x000000000000001a 0x00000f88 0x00000002 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__stub_helper
> 0x000001a0: [  4] 0x0000000100000fa2 0x000000000000000d 0x00000fa2 0x00000000 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000 __TEXT.__cstring
> 0x000001f0: [  5] 0x0000000100000fb0 0x0000000000000048 0x00000fb0 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__unwind_info
> 0x00000288: [  6] 0x0000000100001000 0x0000000000000008 0x00001000 0x00000003 0x00000000 0x00000000 0x00000006 0x00000001 0x00000000 0x00000000 __DATA_CONST.__got
> 0x00000320: [  7] 0x0000000100002000 0x0000000000000008 0x00002000 0x00000003 0x00000000 0x00000000 0x00000007 0x00000002 0x00000000 0x00000000 __DATA.__la_symbol_ptr
> 0x00000370: [  8] 0x0000000100002008 0x0000000000000008 0x00002008 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__data
> ```
> LLDB uses this section index to grab the right section when it parses the symbol.
Ah, I was following ld64's code here which also hard-codes a `sect` value of 1 for `N_FUN` stabs, but I didn't understand the significance of the value. Thanks for the explanation! I think it works out in ld64's case because ld64 puts all functions in `__text` in the final binary. Functions that are in non-text sections in the input object files all get coalesced into `__text`. LLD has yet to implement that behavior, but we plan on doing so.

I think I'll add support for this in the same diff where I'll add support for `N_GSYM`. That way we can write a test for values of `n_sect` other than 1.


================
Comment at: lld/MachO/SyntheticSections.cpp:618
+    stab.sect = 1;
+    stab.strx = stringTableSection.addString(defined->getName());
+    stab.value = defined->getVA();
----------------
clayborg wrote:
> smeenai wrote:
> > I guess we can optimize this later to not duplicate strings in the string table? (Since the symbol table entry will also add the symbol name.)
> Yes, this should be uniqued if it isn't happening already. Large C++ compiles will really trigger this and cause huge symbol table bloat.
ld64 doesn't appear to do it, but it does seem like a worthwhile optimization. I'll add a `TODO` inside `StringTableSection::addString`.


================
Comment at: lld/MachO/SyntheticSections.cpp:649-650
+      InputSection *isec = defined->isec;
+      // XXX is it right to assume that all symbols in __text are function
+      // symbols?
+      if (isec->name == "__text") {
----------------
clayborg wrote:
> smeenai wrote:
> > I think this is valid. How does ld64 make this determination though?
> Not necessarily. Read only globals can be in __text IIRC on ARM and possibly ARM64 or even other architectures.
ld64 checks if their atoms have a type of `typeCode`. That type is determined by their `sectionType()` method, which uses a combination of input section flags and section names to determine it. I *think* any `__text` input section will be of `typeCode`, but I'll have to investigate further. I think we can punt on this.


================
Comment at: lld/MachO/SyntheticSections.cpp:653
+        ObjFile *file = dyn_cast<ObjFile>(isec->file);
+        assert(file);
+        if (!file->compileUnit)
----------------
clayborg wrote:
> Is this truly unreachable code? 
The only file types are object files, bitcode files, archives, and dylibs. We extract ObjFiles from ArchiveFiles & create them from BitcodeFiles before processing their InputSections, so we shouldn't see either of those here. And we shouldn't be emitting debug info for functions that our output binary references from dylibs (I think... correct me if I'm wrong).


================
Comment at: lld/MachO/SyntheticSections.cpp:657-659
+        if (lastFile == nullptr || lastFile != file) {
+          if (lastFile != nullptr)
+            emitEndSourceStab();
----------------
clayborg wrote:
> Is this list sorted by file? We don't want to be emitting multiple N_SO + N_OSO entries for the same source file. dsymutil will be a lot less efficient if we have a few entries for "/tmp/foo.cpp" and "/tmp/foo.o" and then a few function, then some other source + object file, and then "/tmp/foo.cpp" and "/tmp/foo.o" again.
yeah sorting did occur to me but I thought I'd implement it later. It's probably quite straightforward though, so I can do it in this diff


================
Comment at: lld/MachO/SyntheticSections.cpp:697-698
+
+  // Emit the stabs entries after the "real" symbols. We cannot emit them
+  // before as that would render Symbol::symtabIndex inaccurate.
+  for (const StabsEntry &entry : stabs) {
----------------
clayborg wrote:
> So in mach-o files, and preparing for lld emitting a LC_DYSYMTAB load command, all local symbols must be in a contiguous block of symbols. In fact all locals, external, and undefined symbols must be in a contiguous block. That is required because if "ilocalsym" represents the first local symbol table index of all of the local symbols and "nlocalsym" is the out. Same for external with "iextdefsym" and "nextdefsym", and for undefined symbols with "iundefsym" and "nundefsym". 
> 
> ```
> $ otool -lv a.out
> ...
> Load command 7
>             cmd LC_DYSYMTAB
>         cmdsize 80
>       ilocalsym 0
>       nlocalsym 17
>      iextdefsym 17
>      nextdefsym 4
>       iundefsym 21
>       nundefsym 2
> ```
yup, that is done in {D89285}


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89257/new/

https://reviews.llvm.org/D89257



More information about the llvm-commits mailing list