[llvm-dev] Remove obsolete debug info while garbage collecting
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 7 13:20:04 PDT 2019
27.09.2019 11:46, Rui Ueyama пишет:
> Alexey,
>
> I'm a bit worried to teach lld about DWARF, as this is something we've
> been carefully avoid to do. Linkers are mostly agnostic about the
> contents of sections. Sections are basically just bags of bytes, and
> linkers generally don't attempt to parse their contents. That being
> said, we've already taught lld how to parse (some part of) DWARF to
> implement --gdb-index and other features, and because of the nature of
> DWARF file format it is unavoidable. So it may be OK to add more code
> for DWARF dedup, if the additional complexity is not too much, and the
> new code is nicely isolated from existing code. I think I agree
> with you that linker is perhaps the best place to drop dead DWARF
> info. Let me start code review to see how the code works. Thanks!
Ok, Thank you. I started to refactor dsymutil to have a possibility to
use it inside linker.
After that I am going to start to work on linker part of this.
Thank you, Aexey.
>
> On Thu, Sep 26, 2019 at 7:12 AM Alexey Lapshin <a.v.lapshin at mail.ru
> <mailto:a.v.lapshin at mail.ru>> wrote:
>
>
> 25.09.2019 18:49, David Blaikie пишет:
>>
>>
>> On Tue, Sep 24, 2019 at 11:22 PM Rui Ueyama via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Alexay,
>>
>> Thank you for the detailed explanation. The other question I
>> have is, as discussed above, about dsymutil. You said that
>> dsymutil is not usable at link-time. What does that mean? If
>> we only have to emit an output file in the usual way and then
>> automatically invoke dsymutils on the file that the linker
>> just created, that's easy to do, and lld and dsymutil can
>> live in the same process so that you can keep the linker
>> being not depend on an external command.
>>
>>
>> dsymutil isn't really (to my knowledge) setup for that sort of
>> operation at the moment - it's currently very tied to the
>> Apple/OSX/MachO debug info distribution model (it's for creating
>> dsym debug info bundles from a set of object files and an output
>> of addresses from the linker).
>>
>> If it was generalized as a post-processing step, that would be
>> good for archival purposes (reducing the size of debug info in
>> binaries in the long-term) but wouldn't address what are probably
>> the more significant drawbacks for some users (including Google)
>> - the sheer number of bytes copied from input to output during
>> linking - reducing the amount of linker output written in the
>> first place would be significantly beneficial.
>
> I would like to note that PoC implementation does exactly this. it
> reduces number of bytes copied from input to output during
> linking, It reduces the amount of linker output.
>
> Additionally, I measured memory usage of PoC implementation.
> Following table shows memory usage for linking clang :
>
>
> -----------------------------------------------
> | | CL options | Memory |
> -----------------------------------------------
> | A | (default set of options*)| 9145880 kb |
> | | | |
> | B | A +gc-dbginfo |11881960 kb |
> | | | |
> | C | A +gc-dbginfo+gc-dbgtypes|10690388 kb |
> | | | |
> | D | A +fdebug-types-section |8006032 kb |
> | | | |
> | E | D +gc-dbginfo |9000872 kb |
> | | | |
> | F | D +gc-dbginfo+gc-dbgtypes|8994156 kb |
> -----------------------------------------------
>
>
>> (though I do think/hope dsymutil's implementation could be
>> adapted/generalized to be used in this situation - and I do have
>> concerns that doing such non-trivial work at link time might not
>> be a great tradeoff because the complexity and memory usage might
>> be more than the savings, though I've no certainty one way or the
>> other there)
>>
>>
>> On Wed, Sep 25, 2019 at 7:05 AM Alexey Lapshin
>> <a.v.lapshin at mail.ru <mailto:a.v.lapshin at mail.ru>> wrote:
>>
>>
>> 24.09.2019 8:26, Rui Ueyama пишет:
>>> Hi Alexey,
>>>
>>> Thank you for sharing this proposal. Reducing the size
>>> of debug info is generally a good thing, and I believe
>>> you'd see more debug info size reduction in Rust
>>> programs than in C++ programs, because I heard that the
>>> Rust compiler driver passes a lot of object files to the
>>> linker, expecting that the linker would remove most of
>>> them, which leaves dead debug info.
>>>
>> Hi Rui, Thanks!
>>
>>> On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via
>>> llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>> Debuginfo and linker folks, we (AccessSoftek) would
>>> like to suggest a proposal for removing obsolete
>>> debug info. If you find it useful we will be happy
>>> to work on improving it. Thank you for any opinions
>>> and suggestions.
>>>
>>> Alexey.
>>>
>>> Currently when the linker does garbage
>>> collection a lot of abandoned debug info is left
>>> behind (see Appendix A for documentation). Besides
>>> inflated debug info size, we ended up with
>>> overlapping address ranges and no way to say valid
>>> vs garbage ranges. We propose removing debug info
>>> along with removing code. This would reduce debug
>>> info size and make sure debug info accuracy.
>>>
>>> There are several approaches which could be used to
>>> solve that problem:
>>>
>>> 1. Require dwarf producers to generate fragmented
>>> debug data according to DWARF5 specification: "E.3.3
>>> Single-function-per-DWARF-compilation-unit" page
>>> 388. That approach assumes fragmenting the whole
>>> debug info per function basis and glue fragmented
>>> sections at the link time using section groups.
>>>
>>> 2. Use an additional tool, which would optimize out
>>> unnecessary debug data, something similar to dwz
>>> (dwarf compressor tool), dsymutil (links the DWARF
>>> debug information). This approach assumes additional
>>> post-link binaries processing.
>>>
>>> 3. Teach the linker to parse debug data and let it
>>> remove unused debug data.
>>>
>>> In this proposal, we focus on approach #3. We show
>>> that this approach is viable and discuss some
>>> preliminary results, leaving particular
>>> implementation out of the scope. We attach the Proof
>>> of Concept (PoC)
>>> implementation(https://reviews.llvm.org/D67469) for
>>> illustrative purposes. Please keep in mind that it
>>> is not final, and there is room for improvements
>>> (see Appendix B). However, the achieved results look
>>> quite promising and demonstrate up to 2 times size
>>> reduction and performance overhead is 30% of linking
>>> time (which is in the same ballpark as the already
>>> done section compressing (see table 2 point F)).
>>>
>>>
>>> I believe #1 was added to DWARF5 to make link-time debug
>>> info GC possible, so could you tell me a little bit
>>> about why you chose to do #3? Is this because you want
>>> to do this for DWARF4?
>>>
>>>
>> No, that proposal is not DWARF-4 specific. The proposal
>> is for DWARF-5 also. The solution added to
>> DWARF-5("E.3.3
>> Single-function-per-DWARF-compilation-unit" page 388.) is
>> not a complete solution. This is a recommendation which
>> needs to have an additional specification.
>> There is -fdebug-types-section implementation which
>> follows that recommendation. Other cases(other than type
>> units) do not easily fit into this recommendation. There
>> are tables which have a common header. F.e. .debug_line,
>> .debug_rnglists, .debug_addr. It is not clear how these
>> tables could be separated between section groups.
>>
>> The more important thing is the fragmentation itself.
>> Dividing debug tables into pieces would increase debug
>> info size.
>> It also would significantly complicate code working with
>> debug info. F.e.
>> include/llvm/DebugInfo/DWARF/DWARFObject.h has interface
>> for class DWARFObject. It currently is not ready for the
>> case when there could be multiple tables. Patch
>> introducing support for multiple tables would be massive
>> change affected many places in llvm codebase.
>>
>> Another thing is that not only the llvm code base but all
>> other DWARF consumers should be changed to support
>> fragmented debug info.
>>
>> Shortly, if all debug tables would be fragmented then
>> working with debug info would be significantly complicated.
>>
>> Thus the reasons to select #3 are :
>>
>> 1. It could be done in a single place, not affecting
>> other parts of the llvm code base.
>> 2. It does not require other DWARF consumers to implement
>> support for it.
>> 3. Avoiding fragmentation would save space.
>> 4. Processing of not fragmented debug info is faster.
>> 5. No need to adapt DWARF tables for fragmentation. They
>> could be handled with their current state.
>>
>>
>> Alexey
>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191007/db952899/attachment.html>
More information about the llvm-dev
mailing list