[PATCH] D74169: [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info.

Greg Clayton via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 21 16:57:23 PST 2020


clayborg added a comment.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and **the** most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

One last point is to have a flag that enabled DWARF verification. This can be run before the GC operation to ensure the incoming DWARF is in good shape and again after any modifications, to ensure the DWARF is still valid.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74169/new/

https://reviews.llvm.org/D74169





More information about the llvm-commits mailing list