[llvm-dev] Remove obsolete debug info while garbage collecting

Wed Sep 25 08:49:29 PDT 2019

On Tue, Sep 24, 2019 at 11:22 PM Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Alexay,
>
> Thank you for the detailed explanation. The other question I have is, as
> discussed above, about dsymutil. You said that dsymutil is not usable at
> link-time. What does that mean? If we only have to emit an output file in
> the usual way and then automatically invoke dsymutils on the file that the
> linker just created, that's easy to do, and lld and dsymutil can live in
> the same process so that you can keep the linker being not depend on an
> external command.
>

dsymutil isn't really (to my knowledge) setup for that sort of operation at
the moment - it's currently very tied to the Apple/OSX/MachO debug info
distribution model (it's for creating dsym debug info bundles from a set of
object files and an output of addresses from the linker).

If it was generalized as a post-processing step, that would be good for
archival purposes (reducing the size of debug info in binaries in the
long-term) but wouldn't address what are probably the more significant
drawbacks for some users (including Google) - the sheer number of bytes
copied from input to output during linking - reducing the amount of linker
output written in the first place would be significantly beneficial.
(though I do think/hope dsymutil's implementation could be
adapted/generalized to be used in this situation - and I do have concerns
that doing such non-trivial work at link time might not be a great tradeoff
because the complexity and memory usage might be more than the savings,
though I've no certainty one way or the other there)

> On Wed, Sep 25, 2019 at 7:05 AM Alexey Lapshin <a.v.lapshin at mail.ru>
> wrote:
>
>>
>> 24.09.2019 8:26, Rui Ueyama пишет:
>>
>> Hi Alexey,
>>
>> Thank you for sharing this proposal. Reducing the size of debug info is
>> generally a good thing, and I believe you'd see more debug info size
>> reduction in Rust programs than in C++ programs, because I heard that the
>> Rust compiler driver passes a lot of object files to the linker, expecting
>> that the linker would remove most of them, which leaves dead debug info.
>>
>> Hi Rui, Thanks!
>>
>> On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Debuginfo and linker folks, we (AccessSoftek) would like to suggest a
>>> proposal for removing obsolete debug info. If you find it useful we will be
>>> happy to work on improving it. Thank you for any opinions and suggestions.
>>>
>>> Alexey.
>>>
>>>     Currently when the linker does garbage collection a lot of abandoned
>>> debug info is left behind (see Appendix A for documentation). Besides
>>> inflated debug info size, we ended up with overlapping address ranges and
>>> no way to say valid vs garbage ranges. We propose removing debug info along
>>> with removing code. This would reduce debug info size and make sure debug
>>> info accuracy.
>>>
>>> There are several approaches which could be used to solve that problem:
>>>
>>> 1.  Require dwarf producers to generate fragmented debug data according
>>> to DWARF5 specification: "E.3.3 Single-function-per-DWARF-compilation-unit"
>>> page 388. That approach assumes fragmenting the whole debug info per
>>> function basis and glue fragmented sections at the link time using section
>>> groups.
>>>
>>> 2.  Use an additional tool, which would optimize out unnecessary debug
>>> data, something similar to dwz (dwarf compressor tool), dsymutil (links the
>>> DWARF debug information). This approach assumes additional post-link
>>> binaries processing.
>>>
>>> 3.  Teach the linker to parse debug data and let it remove unused debug
>>> data.
>>>
>>> In this proposal, we focus on approach #3. We show that this approach is
>>> viable and discuss some preliminary results, leaving particular
>>> implementation out of the scope. We attach the Proof of Concept (PoC)
>>> implementation(https://reviews.llvm.org/D67469) for illustrative
>>> purposes. Please keep in mind that it is not final, and there is room for
>>> improvements (see Appendix B). However, the achieved results look quite
>>> promising and demonstrate up to 2 times size reduction and performance
>>> overhead is 30% of linking time (which is in the same ballpark as the
>>> already done section compressing (see table 2 point F)).
>>>
>>
>> I believe #1 was added to DWARF5 to make link-time debug info GC
>> possible, so could you tell me a little bit about why you chose to do #3?
>> Is this because you want to do this for DWARF4?
>>
>>
>>> No, that proposal is not DWARF-4 specific. The proposal is for DWARF-5
>> also.  The solution added to DWARF-5("E.3.3
>> Single-function-per-DWARF-compilation-unit" page 388.) is not a complete
>> solution. This is a recommendation which needs to have an additional
>> specification.
>> There is -fdebug-types-section implementation which follows that
>> recommendation.  Other cases(other than type units) do not easily fit into
>> this recommendation. There are tables which have a common header. F.e.
>> .debug_line, .debug_rnglists, .debug_addr. It is not clear how these tables
>> could be separated between section groups.
>>
>> The more important thing is the fragmentation itself. Dividing debug
>> tables into pieces would increase debug info size.
>> It also would significantly complicate code working with debug info. F.e.
>> include/llvm/DebugInfo/DWARF/DWARFObject.h has interface for class
>> DWARFObject. It currently is not ready for the case when there could be
>> multiple tables. Patch introducing support for multiple tables would be
>> massive change affected many places in llvm codebase.
>>
>> Another thing is that not only the llvm code base but all other DWARF
>> consumers should be changed to support fragmented debug info.
>>
>> Shortly, if all debug tables would be fragmented then working with debug
>> info would be significantly complicated.
>>
>> Thus the reasons to select #3 are :
>>
>> 1. It could be done in a single place, not affecting other parts of the
>> llvm code base.
>> 2. It does not require other DWARF consumers to implement support for it.
>> 3. Avoiding fragmentation would save space.
>> 4. Processing of not fragmented debug info is faster.
>> 5. No need to adapt DWARF tables for fragmentation. They could be handled
>> with their current state.
>>
>>
>> Alexey
>>
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190925/4513fc5e/attachment.html>