[llvm-dev] Remove obsolete debug info while garbage collecting

Wed Sep 18 07:25:02 PDT 2019

17.09.2019 3:12, David Blaikie пишет:
>
>
> On Wed, Sep 11, 2019 at 3:32 PM Alexey Lapshin via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Debuginfo and linker folks, we (AccessSoftek) would like to
>     suggest a proposal for removing obsolete debug info. If you find
>     it useful we will be happy to work on improving it. Thank you for
>     any opinions and suggestions.
>
>     Alexey.
>
>         Currently when the linker does garbage collection a lot of
>     abandoned debug info is left behind (see Appendix A for
>     documentation). Besides inflated debug info size, we ended up with
>     overlapping address ranges and no way to say valid vs garbage
>     ranges. We propose removing debug info along with removing code.
>     This would reduce debug info size and make sure debug info accuracy.
>
>     There are several approaches which could be used to solve that
>     problem:
>
>     1.  Require dwarf producers to generate fragmented debug data
>     according to DWARF5 specification: "E.3.3
>     Single-function-per-DWARF-compilation-unit" page 388. That
>     approach assumes fragmenting the whole debug info per function
>     basis and glue fragmented sections at the link time using section
>     groups.
>
>     2.  Use an additional tool, which would optimize out unnecessary
>     debug data, something similar to dwz (dwarf compressor tool),
>     dsymutil (links the DWARF debug information). This approach
>     assumes additional post-link binaries processing.
>
>     3.  Teach the linker to parse debug data and let it remove unused
>     debug data.
>
>     In this proposal, we focus on approach #3. We show that this
>     approach is viable and discuss some preliminary results, leaving
>     particular implementation out of the scope. We attach the Proof of
>     Concept (PoC) implementation(https://reviews.llvm.org/D67469) for
>     illustrative purposes. Please keep in mind that it is not final,
>     and there is room for improvements (see Appendix B). However, the
>     achieved results look quite promising and demonstrate up to 2
>     times size reduction and performance overhead is 30% of linking
>     time (which is in the same ballpark as the already done section
>     compressing (see table 2 point F)).
>
>
> Have you considered/tried reusing the DWARF 
> minimization/deduplication/linking logic that's already in llvm's 
> dsymutil implementation? If we're going to do that having a singular 
> implementation would be desirable.
>
> (bonus points if we could do something like the dsymutil approach when 
> using Split DWARF and building a DWP - taking some address table 
> output from the linker, and using that to help trim things (or, even 
> when having no input from the linker - at least doing more aggressive 
> deduplication during DWP construction than can be currently done with 
> only type units (& potentially removing/avoiding type unit overhead too))
Generally speaking, dsymutil does a very similar thing. It parses DWARF 
DIEs, analyzes relocations, scans through references and throws out 
unused DIEs. But it`s current interface does not allow to use it at link 
stage.
  I think it would be perfect to have a singular implementation.
  Though I did not analyze how easy or is it possible to reuse its code 
at the link stage, it looked like it needs a significant rework.

  Implementation from this proposal does removing of obsolete debug info 
at link stage.
  And so has benefits of already loaded object files, already created 
liveness information,
  generating an optimized binary from scratch.

If dsymutil could be refactored in such manner that could be used at the 
link stage, then it`s implementation could be reused. I would research 
the possibility of such a refactoring.

>     1. Minimize or entirely avoid references from subprograms into
>     other parts of .debug_info section. That would simplify splitting
>     and removing subprograms out in that sense that it would minimize
>     the number of references that should be parsed and followed.
>     (DW_FORM_ref_subroutine instead of DW_FORM_ref_*, ?)
>
>
> Not sure I follow - by "other parts of the .debug_info section" do you 
> mean in the same CU, or cross CU references? Any particular references 
> you have in mind? Or encountered in practice?
I mean here all kinds of references into .debug_info section. Going 
through references is the time-consuming task.
Thus the fewer references there should be followed then the faster it works.

For the cross CU references - It requires to load referenced CU. I do 
not know use cases where cross CU references are used. If that is the 
specific case and is not used inside subprograms usually, then probably 
it is possible to avoid it.

For the same CU - there could probably be cases when references could be 
ignored: https://reviews.llvm.org/P8165
>
>     2. Create additional section - global types table
>     (.debug_types_table). That would significantly reduce the number
>     of references inside .debug_info section. It also makes it
>     possible to have a 4-byte reference in this section instead of
>     8-bytes reference into type unit (DW_FORM_ref_types instead of
>     DW_FORM_ref_sig8). It also makes it possible to place base types
>     into this section and avoid per-compile unit duplication of them.
>     Additionally, there could be achieved size reduction by not
>     generating type unit header. Note, that new section -
>     .debug_types_table - differs from DWARF4 section .debug_types in
>     that sense that: it contains unique type descriptors referenced by
>     offsets instead of list of type units referenced by
>     DW_FORM_ref_sig8;  all table entries share the same abbreviations
>     and do not have type unit headers.
>
>
> What do you mean when you say "global types table" the phrasing in the 
> above paragraph is present-tense, as though this thing exists but 
> doesn't seem to describe what it actually is and how it achieves the 
> things the text says it achieves. Perhaps I've missed some context here.

The "global types table" does not exist yet. It could be created if the 
discussed approach would be considered useful.
Please check the comparison of possible "global types table" and 
currently existed type units: https://reviews.llvm.org/P8164

The benefit of using "global types table" is that it saves the space 
required to keep types comparing with type units solution.

>     3. Define the limited scope for line programs which could be
>     removed independently. I.e. currently .debug_line section contains
>     a program in byte-coded language for a state machine. That program
>     actually represents a matrix [instruction][line information]. In
>     general, it is hard to cut out part of that program and to keep
>     the whole program correct. Thus it would be good to specify
>     separate scopes (related to address ranges) which could be easily
>     removed from the program body.
>
>
> In my experience line tables are /tiny/ - have you prototyped any 
> change in this space to have a sense of whether it would have 
> significant savings? (it'd potentially help address the address 
> ambiguity issues when the linker discards code, though - so might be a 
> correctness issue rather than a size performance issue)

I did not measure the value of size reduction for line table, though I 
think that it would be a small value.
The more important thing is a correctness issue. Line table could 
contain information for overlapping address ranges.

There is another attempt to fix that issue - 
https://reviews.llvm.org/D59553.

>
>     We evaluated the approach on LLVM and Clang codebases. The results
>     obtained are summarized in the tables below:
>
>
> Memory usage statistics (& confidence intervals for the build time) 
> would probably be especially useful for comparing these tradeoffs.
> Doubly so when using compression (since the decompression would need 
> to use more memory, as would the recompression - so, two different 
> tradeoffs (compressed input, compressed output, and then both at the 
> same time))

I would measure memory impact for that PoC implementation, but I expect 
it would be significant.
Memory usage was not optimized yet. There are several things which might 
be done to reduce memory footprint:
do not load all compile units into memory, avoid adding Parent field to 
all DIEs.

Alexey.

>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190918/c0d0d2ed/attachment.html>