[llvm-dev] Remove obsolete debug info while garbage collecting

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 23 22:26:28 PDT 2019


Hi Alexey,

Thank you for sharing this proposal. Reducing the size of debug info is
generally a good thing, and I believe you'd see more debug info size
reduction in Rust programs than in C++ programs, because I heard that the
Rust compiler driver passes a lot of object files to the linker, expecting
that the linker would remove most of them, which leaves dead debug info.

On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Debuginfo and linker folks, we (AccessSoftek) would like to suggest a
> proposal for removing obsolete debug info. If you find it useful we will be
> happy to work on improving it. Thank you for any opinions and suggestions.
>
> Alexey.
>
>     Currently when the linker does garbage collection a lot of abandoned
> debug info is left behind (see Appendix A for documentation). Besides
> inflated debug info size, we ended up with overlapping address ranges and
> no way to say valid vs garbage ranges. We propose removing debug info along
> with removing code. This would reduce debug info size and make sure debug
> info accuracy.
>
> There are several approaches which could be used to solve that problem:
>
> 1.  Require dwarf producers to generate fragmented debug data according to
> DWARF5 specification: "E.3.3 Single-function-per-DWARF-compilation-unit"
> page 388. That approach assumes fragmenting the whole debug info per
> function basis and glue fragmented sections at the link time using section
> groups.
>
> 2.  Use an additional tool, which would optimize out unnecessary debug
> data, something similar to dwz (dwarf compressor tool), dsymutil (links the
> DWARF debug information). This approach assumes additional post-link
> binaries processing.
>
> 3.  Teach the linker to parse debug data and let it remove unused debug
> data.
>
> In this proposal, we focus on approach #3. We show that this approach is
> viable and discuss some preliminary results, leaving particular
> implementation out of the scope. We attach the Proof of Concept (PoC)
> implementation(https://reviews.llvm.org/D67469) for illustrative
> purposes. Please keep in mind that it is not final, and there is room for
> improvements (see Appendix B). However, the achieved results look quite
> promising and demonstrate up to 2 times size reduction and performance
> overhead is 30% of linking time (which is in the same ballpark as the
> already done section compressing (see table 2 point F)).
>

I believe #1 was added to DWARF5 to make link-time debug info GC possible,
so could you tell me a little bit about why you chose to do #3? Is this
because you want to do this for DWARF4?

A straightforward implementation would fully parse DWARF, create an
> in-memory hierarchy of DWARF objects, optimize them, and generate new
> sections content. Thus, it would require too much memory and would take
> noticeable time to process. Instead, the proposed solution is a combination
> of "fragmented DWARF" (#1 above) and "Optimise parsed DWARF at link stage"
> (#3 above). However, there is no preliminary DWARF data fragmentation step.
> Instead, the data is parsed at the link time, and then pieces, that
> correspond to live debug data, are copied into resulting sections.
> Essentially, the patch skips the debug info (subprograms, address ranges,
> line sentences) that corresponds to the dead sections.
>
> Two command-line options are added to lld:
>
> 1. --gc-debuginfo removes pieces of debug information related to the
> discarded sections.
>
> 2. --gc-debuginfo-types does alternative type deduplication while doing
> --gc-debuginfo.
>
> For the purpose of simplicity, some shortcuts were used in this PoC
> implementation:
>
> 1. Same types always use the same abbreviations. Full implementation
> should take different abbreviations into account.
> 2. Split DWARF is not supported.
> 3. Only .debug_abbrev, .debug_info, .debug_ranges, .debug_rnglists,
> .debug_lines tables are processed.
> 4. DWARF64 is not supported.
>
> We also note that the proposed approach is quite universal and could be
> used for other debug info optimization tasks. F.e. there exists an
> alternative solution for data types deduplication (other than using COMDAT
> sections to keep types (-fdebug-types-section)): parse DWARF, cut out
> duplicated types, patch type references to point to the single type
> definition. I.e., it uses the same approach as used for deleting unused
> debug info - cut out unneeded debug section content. This alternative
> implementation should not necessarily replace -fdebug-types-section, but it
> shows that this approach could be used for the type deduplication as well.
> This solution (combined with the global type table, which is not
> implemented by this patch) has some advantages though. It could reduce the
> number of references inside .debug_info section. It could reduce the size
> of the type information by deduplicating base and DW_FORM_ref_sig8 types.
>
>  There are several things which would have been approved by the DWARF
> standard will help this implementation to work better:
>
> 1. Minimize or entirely avoid references from subprograms into other parts
> of .debug_info section. That would simplify splitting and removing
> subprograms out in that sense that it would minimize the number of
> references that should be parsed and followed. (DW_FORM_ref_subroutine
> instead of DW_FORM_ref_*, ?)
>
> 2. Create additional section - global types table (.debug_types_table).
> That would significantly reduce the number of references inside .debug_info
> section. It also makes it possible to have a 4-byte reference in this
> section instead of 8-bytes reference into type unit (DW_FORM_ref_types
> instead of DW_FORM_ref_sig8). It also makes it possible to place base types
> into this section and avoid per-compile unit duplication of them.
> Additionally, there could be achieved size reduction by not generating type
> unit header. Note, that new section - .debug_types_table - differs from
> DWARF4 section .debug_types in that sense that: it contains unique type
> descriptors referenced by offsets instead of list of type units referenced
> by DW_FORM_ref_sig8;  all table entries share the same abbreviations and do
> not have type unit headers.
>
> 3. Define the limited scope for line programs which could be removed
> independently. I.e. currently .debug_line section contains a program in
> byte-coded language for a state machine. That program actually represents a
> matrix [instruction][line information]. In general, it is hard to cut out
> part of that program and to keep the whole program correct. Thus it would
> be good to specify separate scopes (related to address ranges) which could
> be easily removed from the program body.
>
> We evaluated the approach on LLVM and Clang codebases. The results
> obtained are summarized in the tables below:
>
> Abbreviations:
>
> LLVM bin size  - size of llvm  build/bin directory.
> LLVM build time  - compilation time for building llvm.
> Clang size - size of clang binary.
> link time - time for linking clang binary.
> Errors  - number of errors reported by llvm-dwarfdump --verify for clang
> binary.
> gc-dbginfo - linker option added by this patch. Spelled as "-gc-debuginfo".
> gc-dbgtypes - linker option added by this patch. Spelled as
> "-gc-debuginfo-types"
>
> Table 1. LLVM codebase.
>
> -------------------------------------------------------------------
> |   |       CL options          | LLVM bin size  | LLVM build time |
> -------------------------------------------------------------------
> | A | (default set of options*) |  100.0%(17.0GB)|  100.0%(47m26s) |
> |   |                           |                |                 |
> | B | A +gc-dbginfo             |   82.4%(14.0GB)|  108.3%(51m24s) |
> |   |                           |                |                 |
> | C | A +gc-dbginfo +gc-dbgtypes|   56.5%( 9.6GB)|  117.2%(55m36s) |
> |   |                           |                |                 |
> | D | A +fdebug-types-section   |   64.7%(11.0GB)|   98.4%(46m41s) |
> |   |                           |                |                 |
> | E | D +gc-dbginfo             |   45.9%( 7.8GB)|   98.5%(46m43s) |
> |   |                           |                |                 |
> | F | D +gc-dbginfo +gc-dbgtypes|   45.9%( 7.8GB)|   99.9%(47m25s) |
> --------------------------------------------------------------------
>
> Even larger size reduction could be further achieved via
> -ccompress-debug-sections=zlib :
>
> --------------------------------------------------------------------
> |   |       CL options          | LLVM bin size  | LLVM build time |
> --------------------------------------------------------------------
> | G | A +gc-dbginfo +gc-dbgtypes|   30.6%( 5.2GB)|  118.5%(56m11s) |
> |   |                           |                |                 |
> | H | D + gc-dbginfo+gc-dbgtypes|   24.7%( 4,2GB)|  102.2%(48m28s) |
> --------------------------------------------------------------------
>
> Table 2. Clang binary.
>
> ---------------------------------------------------------------------
> |   |       CL options         | Clang size   | link time |  Errors |
> ---------------------------------------------------------------------
> | A | (default set of options*)|100.0%(1,46GB)| 100%(23s)|2.5mln(**)|
> |   |                          |              |           |         |
> | B | A +gc-dbginfo            | 87.0%(1.27GB)| 417%( 96s)| 0.12mln |
> |   |                          |              |           |         |
> | C | A +gc-dbginfo+gc-dbgtypes| 56.2%(0.82GB)| 530%(122s)| 0.12mln |
> |   |                          |              |           |         |
> | D | A +fdebug-types-section  | 54.8%(0.80GB)|  74%( 17s)| 3.60mln |
> |   |                          |              |           |         |
> | E | D +gc-dbginfo            | 43.2%(0.63GB)| 117%( 27s)| 1.30mln |
> |   |                          |              |           |         |
> | F | D +gc-dbginfo+gc-dbgtypes| 42.5%(0.62GB)| 121%( 28s)| 0.50mln |
> --------------------------------------------------------------------
>
> Even larger size reduction could be further achieved via
> -ccompress-debug-sections=zlib :
>
> ---------------------------------------------------------------------
> |   |       CL options         | Clang size   | link time |  Errors |
> ---------------------------------------------------------------------
> | G | A +gc-dbginfo+gc-dbgtypes| 31.5%(0.46GB)| 613%(141s)| 0.12mln |
> |   |                          |              |           |         |
> | H | D +gc-dbginfo+gc-dbgtypes| 24.7%(0.36GB)| 173%( 40s)| 0.50mln |
> --------------------------------------------------------------------
>
> (*)
> LLVM_TARGETS_TO_BUILD X86;AArch64
> LLVM_TOOL_CLANG_BUILD=ON
> LLVM_TOOL_LLD_BUILD=ON
> LLVM_USE_LINKER=lld
> CMAKE_CXX_FLAGS=--ffunction-sections
> CMAKE_C_FLAGS=--ffunction-sections
> CMAKE_EXE_LINKER_FLAGS=--Wl,--gc-sections
> CMAKE_MODULE_LINKER_FLAGS=-Wl,--gc-sections
> CMAKE_SHARED_LINKER_FLAGS=-Wl,--gc-sections
>
> (**) Significantly large number of errors for non-patched clang is due to
> error “overlapping ranges”.
>
> (***) HW configuration:
>
> OS      Ubuntu 18.04
> CPU     Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
> RAM     32018 MiB
> Storage SSD
>
> =====================================================================
> Appendix A. Documentation: previous topics on llvm-dev, links to Dwarf
> wiki, related reviews. These links do not relate only to -function-sections
> case but for other DWARF reducing questions also.
>
> 1. [llvm-dev] [lldb-dev] [LLD] How to get rid of debug info of sections
> deleted
>    by garbage collector
>     http://lists.llvm.org/pipermail/llvm-dev/2018-September/126282.html
>
> 2. [lldb-dev] LLDB behaviour for GCed sections
>     http://lists.llvm.org/pipermail/lldb-dev/2017-March/012081.html
>
> 3. [llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
>     http://lists.llvm.org/pipermail/llvm-dev/2017-December/119470.html
>
> 4. [llvm-dev] [DWARF] De-segregating type units and compile units
>     http://lists.llvm.org/pipermail/llvm-dev/2018-July/124819.html
>
> 5. [LLD][ELF][DebugInfo] llvm-symbolizer shows incorrect source line info
> if
>    --gc-sections used
>     https://reviews.llvm.org/D59553
>
> 6. Discard debuginfo for object files empty after GC
>     https://reviews.llvm.org/D54747
>
> 7. [ELF] Add --strip-debug-non-line option
>     https://reviews.llvm.org/D46628
>
> 8. Monolithic input section handling
>
> https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
>
> 9. Using COMDAT Sections to Reduce the Size of DWARF Debug Information
>     http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections
>
> 10. DWARF Extensions for Unwinding Across Merged Functions
>     http://wiki.dwarfstd.org/index.php?title=ICF
>
> 11. Type Unit Merge
>     http://dwarfstd.org/ShowIssue.php?issue=130526.1
>
> 12. DWARF Extensions for Separate Debug Information Files
>     https://gcc.gnu.org/wiki/DebugFission
>
> 13. dwz dwarf compressor
>     http://sourceware.org/git/dwz.git
>
> 14. DWARF5 standard
>     http://dwarfstd.org/Dwarf5Std.php
>
> =====================================================================
> Appendix B. List of improvements which are not done by this patch.
>
> 1. Type hash calculation should not be done at the linkage stage. DWARF
> producer should do it. As it is already done in -fdebug-types-section
> implementation and defined in DWARF standard: "A type signature is computed
> only by a DWARF producer; a consumer need only compare two type signatures
> to check for equality."
>
> 2. Alternative types deduplication implementation should use global type
> table to store types.
>
> 3. DWARF parsing classes could be improved to parse DIEs faster.
>
> 4. Processing could probably be done per compilation unit(i.e., not
> loading all units in memory).
>
> 5. Better impact on the resulting size of binary could be achieved by
> optimizing more debug info sections.
>
> 6. This implementation is not optimized for speed. Run-time performance
> could be improved if optimization would be done.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190924/a3d78224/attachment-0001.html>


More information about the llvm-dev mailing list