<HTML><BODY>Debuginfo and linker folks, we (AccessSoftek) would like to suggest a proposal for removing obsolete debug info. If you find it useful we will be happy to work on improving it. Thank you for any opinions and suggestions.<br><br>Alexey.<br><br> Currently when the linker does garbage collection a lot of abandoned debug info is left behind (see Appendix A for documentation). Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges. We propose removing debug info along with removing code. This would reduce debug info size and make sure debug info accuracy.<br><br>There are several approaches which could be used to solve that problem:<br><br>1. Require dwarf producers to generate fragmented debug data according to DWARF5 specification: "E.3.3 Single-function-per-DWARF-compilation-unit" page 388. That approach assumes fragmenting the whole debug info per function basis and glue fragmented sections at the link time using section groups.<br><br>2. Use an additional tool, which would optimize out unnecessary debug data, something similar to dwz (dwarf compressor tool), dsymutil (links the DWARF debug information). This approach assumes additional post-link binaries processing.<br><br>3. Teach the linker to parse debug data and let it remove unused debug data. <br><br>In this proposal, we focus on approach #3. We show that this approach is viable and discuss some preliminary results, leaving particular implementation out of the scope. We attach the Proof of Concept (PoC) implementation(https://reviews.llvm.org/D67469) for illustrative purposes. Please keep in mind that it is not final, and there is room for improvements (see Appendix B). However, the achieved results look quite promising and demonstrate up to 2 times size reduction and performance overhead is 30% of linking time (which is in the same ballpark as the already done section compressing (see table 2 point F)).<br><br>A straightforward implementation would fully parse DWARF, create an in-memory hierarchy of DWARF objects, optimize them, and generate new sections content. Thus, it would require too much memory and would take noticeable time to process. Instead, the proposed solution is a combination of "fragmented DWARF" (#1 above) and "Optimise parsed DWARF at link stage" (#3 above). However, there is no preliminary DWARF data fragmentation step. Instead, the data is parsed at the link time, and then pieces, that correspond to live debug data, are copied into resulting sections. Essentially, the patch skips the debug info (subprograms, address ranges, line sentences) that corresponds to the dead sections. <br><br>Two command-line options are added to lld:<br><br>1. --gc-debuginfo removes pieces of debug information related to the discarded sections. <br><br>2. --gc-debuginfo-types does alternative type deduplication while doing --gc-debuginfo.<br><br>For the purpose of simplicity, some shortcuts were used in this PoC implementation:<br><br>1. Same types always use the same abbreviations. Full implementation should take different abbreviations into account.<br>2. Split DWARF is not supported.<br>3. Only .debug_abbrev, .debug_info, .debug_ranges, .debug_rnglists, .debug_lines tables are processed.<br>4. DWARF64 is not supported.<br><br>We also note that the proposed approach is quite universal and could be used for other debug info optimization tasks. F.e. there exists an alternative solution for data types deduplication (other than using COMDAT sections to keep types (-fdebug-types-section)): parse DWARF, cut out duplicated types, patch type references to point to the single type definition. I.e., it uses the same approach as used for deleting unused debug info - cut out unneeded debug section content. This alternative implementation should not necessarily replace -fdebug-types-section, but it shows that this approach could be used for the type deduplication as well. This solution (combined with the global type table, which is not implemented by this patch) has some advantages though. It could reduce the number of references inside .debug_info section. It could reduce the size of the type information by deduplicating base and DW_FORM_ref_sig8 types. <br><br> There are several things which would have been approved by the DWARF standard will help this implementation to work better:<br><br>1. Minimize or entirely avoid references from subprograms into other parts of .debug_info section. That would simplify splitting and removing subprograms out in that sense that it would minimize the number of references that should be parsed and followed. (DW_FORM_ref_subroutine instead of DW_FORM_ref_*, ?)<br><br>2. Create additional section - global types table (.debug_types_table). That would significantly reduce the number of references inside .debug_info section. It also makes it possible to have a 4-byte reference in this section instead of 8-bytes reference into type unit (DW_FORM_ref_types instead of DW_FORM_ref_sig8). It also makes it possible to place base types into this section and avoid per-compile unit duplication of them. Additionally, there could be achieved size reduction by not generating type unit header. Note, that new section - .debug_types_table - differs from DWARF4 section .debug_types in that sense that: it contains unique type descriptors referenced by offsets instead of list of type units referenced by DW_FORM_ref_sig8; all table entries share the same abbreviations and do not have type unit headers.<br><br>3. Define the limited scope for line programs which could be removed independently. I.e. currently .debug_line section contains a program in byte-coded language for a state machine. That program actually represents a matrix [instruction][line information]. In general, it is hard to cut out part of that program and to keep the whole program correct. Thus it would be good to specify separate scopes (related to address ranges) which could be easily removed from the program body.<br><br>We evaluated the approach on LLVM and Clang codebases. The results obtained are summarized in the tables below:<br><br>Abbreviations:<br><br>LLVM bin size - size of llvm build/bin directory.<br>LLVM build time - compilation time for building llvm.<br>Clang size - size of clang binary.<br>link time - time for linking clang binary.<br>Errors - number of errors reported by llvm-dwarfdump --verify for clang binary.<br>gc-dbginfo - linker option added by this patch. Spelled as "-gc-debuginfo".<br>gc-dbgtypes - linker option added by this patch. Spelled as "-gc-debuginfo-types"<br><br>Table 1. LLVM codebase.<br><br><span style="font-family: courier new, courier;">-------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| | CL options | LLVM bin size | LLVM build time | </span><br><span style="font-family: courier new, courier;">-------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| A | (default set of options*) | 100.0%(17.0GB)| 100.0%(47m26s) | </span><br><span style="font-family: courier new, courier;">| | | | |</span><br><span style="font-family: courier new, courier;">| B | A +gc-dbginfo | 82.4%(14.0GB)| 108.3%(51m24s) |</span><br><span style="font-family: courier new, courier;">| | | | |</span><br><span style="font-family: courier new, courier;">| C | A +gc-dbginfo +gc-dbgtypes| 56.5%( 9.6GB)| 117.2%(55m36s) |</span><br><span style="font-family: courier new, courier;">| | | | |</span><br><span style="font-family: courier new, courier;">| D | A +fdebug-types-section | 64.7%(11.0GB)| 98.4%(46m41s) |</span><br><span style="font-family: courier new, courier;">| | | | | </span><br><span style="font-family: courier new, courier;">| E | D +gc-dbginfo | 45.9%( 7.8GB)| 98.5%(46m43s) |</span><br><span style="font-family: courier new, courier;">| | | | |</span><br><span style="font-family: courier new, courier;">| F | D +gc-dbginfo +gc-dbgtypes| 45.9%( 7.8GB)| 99.9%(47m25s) |</span><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br><br>Even larger size reduction could be further achieved via -ccompress-debug-sections=zlib :<br><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| | CL options | LLVM bin size | LLVM build time | </span><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| G | A +gc-dbginfo +gc-dbgtypes| 30.6%( 5.2GB)| 118.5%(56m11s) |</span><br><span style="font-family: courier new, courier;">| | | | |</span><br><span style="font-family: courier new, courier;">| H | D + gc-dbginfo+gc-dbgtypes| 24.7%( 4,2GB)| 102.2%(48m28s) |</span><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br> <br>Table 2. Clang binary.<br> <br><span style="font-family: courier new, courier;">---------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| | CL options | Clang size | link time | Errors |</span><br><span style="font-family: courier new, courier;">---------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| A | (default set of options*)|100.0%(1,46GB)| 100%(23s)|2.5mln(**)|</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| B | A +gc-dbginfo | 87.0%(1.27GB)| 417%( 96s)| 0.12mln |</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| C | A +gc-dbginfo+gc-dbgtypes| 56.2%(0.82GB)| 530%(122s)| 0.12mln |</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| D | A +fdebug-types-section | 54.8%(0.80GB)| 74%( 17s)| 3.60mln |</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| E | D +gc-dbginfo | 43.2%(0.63GB)| 117%( 27s)| 1.30mln |</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| F | D +gc-dbginfo+gc-dbgtypes| 42.5%(0.62GB)| 121%( 28s)| 0.50mln |</span><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br><br>Even larger size reduction could be further achieved via -ccompress-debug-sections=zlib :<br><br><span style="font-family: courier new, courier;">---------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| | CL options | Clang size | link time | Errors |</span><br><span style="font-family: courier new, courier;">---------------------------------------------------------------------</span><br><span style="font-family: courier new, courier;">| G | A +gc-dbginfo+gc-dbgtypes| 31.5%(0.46GB)| 613%(141s)| 0.12mln |</span><br><span style="font-family: courier new, courier;">| | | | | |</span><br><span style="font-family: courier new, courier;">| H | D +gc-dbginfo+gc-dbgtypes| 24.7%(0.36GB)| 173%( 40s)| 0.50mln |</span><br><span style="font-family: courier new, courier;">--------------------------------------------------------------------</span><br><br>(*)<br>LLVM_TARGETS_TO_BUILD X86;AArch64<br>LLVM_TOOL_CLANG_BUILD=ON<br>LLVM_TOOL_LLD_BUILD=ON<br>LLVM_USE_LINKER=lld<br>CMAKE_CXX_FLAGS=--ffunction-sections<br>CMAKE_C_FLAGS=--ffunction-sections<br>CMAKE_EXE_LINKER_FLAGS=--Wl,--gc-sections<br>CMAKE_MODULE_LINKER_FLAGS=-Wl,--gc-sections<br>CMAKE_SHARED_LINKER_FLAGS=-Wl,--gc-sections<br><br>(**) Significantly large number of errors for non-patched clang is due to error “overlapping ranges”.<br><br>(***) HW configuration:<br><br>OS Ubuntu 18.04<br>CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz<br>RAM 32018 MiB<br>Storage SSD<br><br>=====================================================================<br>Appendix A. Documentation: previous topics on llvm-dev, links to Dwarf wiki, related reviews. These links do not relate only to -function-sections case but for other DWARF reducing questions also.<br><br>1. [llvm-dev] [lldb-dev] [LLD] How to get rid of debug info of sections deleted<br> by garbage collector<br> http://lists.llvm.org/pipermail/llvm-dev/2018-September/126282.html<br><br>2. [lldb-dev] LLDB behaviour for GCed sections<br> http://lists.llvm.org/pipermail/lldb-dev/2017-March/012081.html<br><br>3. [llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).<br> http://lists.llvm.org/pipermail/llvm-dev/2017-December/119470.html<br><br>4. [llvm-dev] [DWARF] De-segregating type units and compile units<br> http://lists.llvm.org/pipermail/llvm-dev/2018-July/124819.html<br><br>5. [LLD][ELF][DebugInfo] llvm-symbolizer shows incorrect source line info if<br> --gc-sections used<br> https://reviews.llvm.org/D59553<br><br>6. Discard debuginfo for object files empty after GC<br> https://reviews.llvm.org/D54747<br><br>7. [ELF] Add --strip-debug-non-line option<br> https://reviews.llvm.org/D46628<br><br>8. Monolithic input section handling<br> https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ<br><br>9. Using COMDAT Sections to Reduce the Size of DWARF Debug Information<br> http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections<br><br>10. DWARF Extensions for Unwinding Across Merged Functions<br> http://wiki.dwarfstd.org/index.php?title=ICF<br><br>11. Type Unit Merge<br> http://dwarfstd.org/ShowIssue.php?issue=130526.1<br><br>12. DWARF Extensions for Separate Debug Information Files<br> https://gcc.gnu.org/wiki/DebugFission<br><br>13. dwz dwarf compressor<br> http://sourceware.org/git/dwz.git<br><br>14. DWARF5 standard<br> http://dwarfstd.org/Dwarf5Std.php<br><br>=====================================================================<br>Appendix B. List of improvements which are not done by this patch.<br> <br>1. Type hash calculation should not be done at the linkage stage. DWARF producer should do it. As it is already done in -fdebug-types-section implementation and defined in DWARF standard: "A type signature is computed only by a DWARF producer; a consumer need only compare two type signatures to check for equality."<br><br>2. Alternative types deduplication implementation should use global type table to store types. <br><br>3. DWARF parsing classes could be improved to parse DIEs faster.<br><br>4. Processing could probably be done per compilation unit(i.e., not loading all units in memory).<br><br>5. Better impact on the resulting size of binary could be achieved by optimizing more debug info sections.<br><br>6. This implementation is not optimized for speed. Run-time performance could be improved if optimization would be done.<br><br><br></BODY></HTML>