[llvm-dev] Remove obsolete debug info while garbage collecting

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 24 15:05:23 PDT 2019


24.09.2019 8:26, Rui Ueyama пишет:
> Hi Alexey,
>
> Thank you for sharing this proposal. Reducing the size of debug info 
> is generally a good thing, and I believe you'd see more debug info 
> size reduction in Rust programs than in C++ programs, because I heard 
> that the Rust compiler driver passes a lot of object files to the 
> linker, expecting that the linker would remove most of them, which 
> leaves dead debug info.
>
Hi Rui, Thanks!

> On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Debuginfo and linker folks, we (AccessSoftek) would like to
>     suggest a proposal for removing obsolete debug info. If you find
>     it useful we will be happy to work on improving it. Thank you for
>     any opinions and suggestions.
>
>     Alexey.
>
>         Currently when the linker does garbage collection a lot of
>     abandoned debug info is left behind (see Appendix A for
>     documentation). Besides inflated debug info size, we ended up with
>     overlapping address ranges and no way to say valid vs garbage
>     ranges. We propose removing debug info along with removing code.
>     This would reduce debug info size and make sure debug info accuracy.
>
>     There are several approaches which could be used to solve that
>     problem:
>
>     1.  Require dwarf producers to generate fragmented debug data
>     according to DWARF5 specification: "E.3.3
>     Single-function-per-DWARF-compilation-unit" page 388. That
>     approach assumes fragmenting the whole debug info per function
>     basis and glue fragmented sections at the link time using section
>     groups.
>
>     2.  Use an additional tool, which would optimize out unnecessary
>     debug data, something similar to dwz (dwarf compressor tool),
>     dsymutil (links the DWARF debug information). This approach
>     assumes additional post-link binaries processing.
>
>     3.  Teach the linker to parse debug data and let it remove unused
>     debug data.
>
>     In this proposal, we focus on approach #3. We show that this
>     approach is viable and discuss some preliminary results, leaving
>     particular implementation out of the scope. We attach the Proof of
>     Concept (PoC) implementation(https://reviews.llvm.org/D67469) for
>     illustrative purposes. Please keep in mind that it is not final,
>     and there is room for improvements (see Appendix B). However, the
>     achieved results look quite promising and demonstrate up to 2
>     times size reduction and performance overhead is 30% of linking
>     time (which is in the same ballpark as the already done section
>     compressing (see table 2 point F)).
>
>
> I believe #1 was added to DWARF5 to make link-time debug info GC 
> possible, so could you tell me a little bit about why you chose to do 
> #3? Is this because you want to do this for DWARF4?
>
>
No, that proposal is not DWARF-4 specific. The proposal is for DWARF-5 
also.  The solution added to DWARF-5("E.3.3 
Single-function-per-DWARF-compilation-unit" page 388.) is not a complete 
solution. This is a recommendation which needs to have an additional 
specification.
There is -fdebug-types-section implementation which follows that 
recommendation.  Other cases(other than type units) do not easily fit 
into this recommendation. There are tables which have a common header. 
F.e. .debug_line, .debug_rnglists, .debug_addr. It is not clear how 
these tables could be separated between section groups.

The more important thing is the fragmentation itself. Dividing debug 
tables into pieces would increase debug info size.
It also would significantly complicate code working with debug info. 
F.e. include/llvm/DebugInfo/DWARF/DWARFObject.h has interface for class 
DWARFObject. It currently is not ready for the case when there could be 
multiple tables. Patch introducing support for multiple tables would be 
massive change affected many places in llvm codebase.

Another thing is that not only the llvm code base but all other DWARF 
consumers should be changed to support fragmented debug info.

Shortly, if all debug tables would be fragmented then working with debug 
info would be significantly complicated.

Thus the reasons to select #3 are :

1. It could be done in a single place, not affecting other parts of the 
llvm code base.
2. It does not require other DWARF consumers to implement support for it.
3. Avoiding fragmentation would save space.
4. Processing of not fragmented debug info is faster.
5. No need to adapt DWARF tables for fragmentation. They could be 
handled with their current state.


Alexey

>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190925/f032e665/attachment.html>


More information about the llvm-dev mailing list