[llvm-dev] Remove obsolete debug info while garbage collecting

Wed Sep 25 15:12:41 PDT 2019

25.09.2019 18:49, David Blaikie пишет:
>
>
> On Tue, Sep 24, 2019 at 11:22 PM Rui Ueyama via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Alexay,
>
>     Thank you for the detailed explanation. The other question I have
>     is, as discussed above, about dsymutil. You said that dsymutil is
>     not usable at link-time. What does that mean? If we only have to
>     emit an output file in the usual way and then automatically invoke
>     dsymutils on the file that the linker just created, that's easy to
>     do, and lld and dsymutil can live in the same process so that you
>     can keep the linker being not depend on an external command.
>
>
> dsymutil isn't really (to my knowledge) setup for that sort of 
> operation at the moment - it's currently very tied to the 
> Apple/OSX/MachO debug info distribution model (it's for creating dsym 
> debug info bundles from a set of object files and an output of 
> addresses from the linker).
>
> If it was generalized as a post-processing step, that would be good 
> for archival purposes (reducing the size of debug info in binaries in 
> the long-term) but wouldn't address what are probably the more 
> significant drawbacks for some users (including Google) - the sheer 
> number of bytes copied from input to output during linking - reducing 
> the amount of linker output written in the first place would be 
> significantly beneficial.

I would like to note that PoC implementation does exactly this. it 
reduces number of bytes copied from input to output during linking, It 
reduces the amount of linker output.

Additionally, I measured memory usage of PoC implementation. Following 
table shows memory usage for linking clang :

-----------------------------------------------
|   |       CL options         |    Memory    |
-----------------------------------------------
| A | (default set of options*)|   9145880 kb  |
|   |                          |              |
| B | A +gc-dbginfo            |11881960 kb  |
|   |                          |              |
| C | A +gc-dbginfo+gc-dbgtypes|10690388 kb  |
|   |                          |              |
| D | A +fdebug-types-section  |8006032 kb  |
|   |                          |              |
| E | D +gc-dbginfo            |9000872 kb  |
|   |                          |              |
| F | D +gc-dbginfo+gc-dbgtypes|8994156 kb  |
-----------------------------------------------

> (though I do think/hope dsymutil's implementation could be 
> adapted/generalized to be used in this situation - and I do have 
> concerns that doing such non-trivial work at link time might not be a 
> great tradeoff because the complexity and memory usage might be more 
> than the savings, though I've no certainty one way or the other there)
>
>
>     On Wed, Sep 25, 2019 at 7:05 AM Alexey Lapshin
>     <a.v.lapshin at mail.ru <mailto:a.v.lapshin at mail.ru>> wrote:
>
>
>         24.09.2019 8:26, Rui Ueyama пишет:
>>         Hi Alexey,
>>
>>         Thank you for sharing this proposal. Reducing the size of
>>         debug info is generally a good thing, and I believe you'd see
>>         more debug info size reduction in Rust programs than in C++
>>         programs, because I heard that the Rust compiler driver
>>         passes a lot of object files to the linker, expecting that
>>         the linker would remove most of them, which leaves dead debug
>>         info.
>>
>         Hi Rui, Thanks!
>
>>         On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via llvm-dev
>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>             Debuginfo and linker folks, we (AccessSoftek) would like
>>             to suggest a proposal for removing obsolete debug info.
>>             If you find it useful we will be happy to work on
>>             improving it. Thank you for any opinions and suggestions.
>>
>>             Alexey.
>>
>>                 Currently when the linker does garbage collection a
>>             lot of abandoned debug info is left behind (see Appendix
>>             A for documentation). Besides inflated debug info size,
>>             we ended up with overlapping address ranges and no way to
>>             say valid vs garbage ranges. We propose removing debug
>>             info along with removing code. This would reduce debug
>>             info size and make sure debug info accuracy.
>>
>>             There are several approaches which could be used to solve
>>             that problem:
>>
>>             1.  Require dwarf producers to generate fragmented debug
>>             data according to DWARF5 specification: "E.3.3
>>             Single-function-per-DWARF-compilation-unit" page 388.
>>             That approach assumes fragmenting the whole debug info
>>             per function basis and glue fragmented sections at the
>>             link time using section groups.
>>
>>             2.  Use an additional tool, which would optimize out
>>             unnecessary debug data, something similar to dwz (dwarf
>>             compressor tool), dsymutil (links the DWARF debug
>>             information). This approach assumes additional post-link
>>             binaries processing.
>>
>>             3.  Teach the linker to parse debug data and let it
>>             remove unused debug data.
>>
>>             In this proposal, we focus on approach #3. We show that
>>             this approach is viable and discuss some preliminary
>>             results, leaving particular implementation out of the
>>             scope. We attach the Proof of Concept (PoC)
>>             implementation(https://reviews.llvm.org/D67469) for
>>             illustrative purposes. Please keep in mind that it is not
>>             final, and there is room for improvements (see Appendix
>>             B). However, the achieved results look quite promising
>>             and demonstrate up to 2 times size reduction and
>>             performance overhead is 30% of linking time (which is in
>>             the same ballpark as the already done section compressing
>>             (see table 2 point F)).
>>
>>
>>         I believe #1 was added to DWARF5 to make link-time debug info
>>         GC possible, so could you tell me a little bit about why you
>>         chose to do #3? Is this because you want to do this for DWARF4?
>>
>>
>         No, that proposal is not DWARF-4 specific. The proposal is for
>         DWARF-5 also.  The solution added to DWARF-5("E.3.3
>         Single-function-per-DWARF-compilation-unit" page 388.) is not
>         a complete solution. This is a recommendation which needs to
>         have an additional specification.
>         There is -fdebug-types-section implementation which follows
>         that recommendation.  Other cases(other than type units) do
>         not easily fit into this recommendation. There are tables
>         which have a common header. F.e. .debug_line, .debug_rnglists,
>         .debug_addr. It is not clear how these tables could be
>         separated between section groups.
>
>         The more important thing is the fragmentation itself. Dividing
>         debug tables into pieces would increase debug info size.
>         It also would significantly complicate code working with debug
>         info. F.e. include/llvm/DebugInfo/DWARF/DWARFObject.h has
>         interface for class DWARFObject. It currently is not ready for
>         the case when there could be multiple tables. Patch
>         introducing support for multiple tables would be massive
>         change affected many places in llvm codebase.
>
>         Another thing is that not only the llvm code base but all
>         other DWARF consumers should be changed to support fragmented
>         debug info.
>
>         Shortly, if all debug tables would be fragmented then working
>         with debug info would be significantly complicated.
>
>         Thus the reasons to select #3 are :
>
>         1. It could be done in a single place, not affecting other
>         parts of the llvm code base.
>         2. It does not require other DWARF consumers to implement
>         support for it.
>         3. Avoiding fragmentation would save space.
>         4. Processing of not fragmented debug info is faster.
>         5. No need to adapt DWARF tables for fragmentation. They could
>         be handled with their current state.
>
>
>         Alexey
>
>>
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/f3c0c607/attachment.html>