[llvm-dev] Remove obsolete debug info while garbage collecting
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Wed Sep 25 15:12:41 PDT 2019
25.09.2019 18:49, David Blaikie пишет:
>
>
> On Tue, Sep 24, 2019 at 11:22 PM Rui Ueyama via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Alexay,
>
> Thank you for the detailed explanation. The other question I have
> is, as discussed above, about dsymutil. You said that dsymutil is
> not usable at link-time. What does that mean? If we only have to
> emit an output file in the usual way and then automatically invoke
> dsymutils on the file that the linker just created, that's easy to
> do, and lld and dsymutil can live in the same process so that you
> can keep the linker being not depend on an external command.
>
>
> dsymutil isn't really (to my knowledge) setup for that sort of
> operation at the moment - it's currently very tied to the
> Apple/OSX/MachO debug info distribution model (it's for creating dsym
> debug info bundles from a set of object files and an output of
> addresses from the linker).
>
> If it was generalized as a post-processing step, that would be good
> for archival purposes (reducing the size of debug info in binaries in
> the long-term) but wouldn't address what are probably the more
> significant drawbacks for some users (including Google) - the sheer
> number of bytes copied from input to output during linking - reducing
> the amount of linker output written in the first place would be
> significantly beneficial.
I would like to note that PoC implementation does exactly this. it
reduces number of bytes copied from input to output during linking, It
reduces the amount of linker output.
Additionally, I measured memory usage of PoC implementation. Following
table shows memory usage for linking clang :
-----------------------------------------------
| | CL options | Memory |
-----------------------------------------------
| A | (default set of options*)| 9145880 kb |
| | | |
| B | A +gc-dbginfo |11881960 kb |
| | | |
| C | A +gc-dbginfo+gc-dbgtypes|10690388 kb |
| | | |
| D | A +fdebug-types-section |8006032 kb |
| | | |
| E | D +gc-dbginfo |9000872 kb |
| | | |
| F | D +gc-dbginfo+gc-dbgtypes|8994156 kb |
-----------------------------------------------
> (though I do think/hope dsymutil's implementation could be
> adapted/generalized to be used in this situation - and I do have
> concerns that doing such non-trivial work at link time might not be a
> great tradeoff because the complexity and memory usage might be more
> than the savings, though I've no certainty one way or the other there)
>
>
> On Wed, Sep 25, 2019 at 7:05 AM Alexey Lapshin
> <a.v.lapshin at mail.ru <mailto:a.v.lapshin at mail.ru>> wrote:
>
>
> 24.09.2019 8:26, Rui Ueyama пишет:
>> Hi Alexey,
>>
>> Thank you for sharing this proposal. Reducing the size of
>> debug info is generally a good thing, and I believe you'd see
>> more debug info size reduction in Rust programs than in C++
>> programs, because I heard that the Rust compiler driver
>> passes a lot of object files to the linker, expecting that
>> the linker would remove most of them, which leaves dead debug
>> info.
>>
> Hi Rui, Thanks!
>
>> On Thu, Sep 12, 2019 at 7:32 AM Alexey Lapshin via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Debuginfo and linker folks, we (AccessSoftek) would like
>> to suggest a proposal for removing obsolete debug info.
>> If you find it useful we will be happy to work on
>> improving it. Thank you for any opinions and suggestions.
>>
>> Alexey.
>>
>> Currently when the linker does garbage collection a
>> lot of abandoned debug info is left behind (see Appendix
>> A for documentation). Besides inflated debug info size,
>> we ended up with overlapping address ranges and no way to
>> say valid vs garbage ranges. We propose removing debug
>> info along with removing code. This would reduce debug
>> info size and make sure debug info accuracy.
>>
>> There are several approaches which could be used to solve
>> that problem:
>>
>> 1. Require dwarf producers to generate fragmented debug
>> data according to DWARF5 specification: "E.3.3
>> Single-function-per-DWARF-compilation-unit" page 388.
>> That approach assumes fragmenting the whole debug info
>> per function basis and glue fragmented sections at the
>> link time using section groups.
>>
>> 2. Use an additional tool, which would optimize out
>> unnecessary debug data, something similar to dwz (dwarf
>> compressor tool), dsymutil (links the DWARF debug
>> information). This approach assumes additional post-link
>> binaries processing.
>>
>> 3. Teach the linker to parse debug data and let it
>> remove unused debug data.
>>
>> In this proposal, we focus on approach #3. We show that
>> this approach is viable and discuss some preliminary
>> results, leaving particular implementation out of the
>> scope. We attach the Proof of Concept (PoC)
>> implementation(https://reviews.llvm.org/D67469) for
>> illustrative purposes. Please keep in mind that it is not
>> final, and there is room for improvements (see Appendix
>> B). However, the achieved results look quite promising
>> and demonstrate up to 2 times size reduction and
>> performance overhead is 30% of linking time (which is in
>> the same ballpark as the already done section compressing
>> (see table 2 point F)).
>>
>>
>> I believe #1 was added to DWARF5 to make link-time debug info
>> GC possible, so could you tell me a little bit about why you
>> chose to do #3? Is this because you want to do this for DWARF4?
>>
>>
> No, that proposal is not DWARF-4 specific. The proposal is for
> DWARF-5 also. The solution added to DWARF-5("E.3.3
> Single-function-per-DWARF-compilation-unit" page 388.) is not
> a complete solution. This is a recommendation which needs to
> have an additional specification.
> There is -fdebug-types-section implementation which follows
> that recommendation. Other cases(other than type units) do
> not easily fit into this recommendation. There are tables
> which have a common header. F.e. .debug_line, .debug_rnglists,
> .debug_addr. It is not clear how these tables could be
> separated between section groups.
>
> The more important thing is the fragmentation itself. Dividing
> debug tables into pieces would increase debug info size.
> It also would significantly complicate code working with debug
> info. F.e. include/llvm/DebugInfo/DWARF/DWARFObject.h has
> interface for class DWARFObject. It currently is not ready for
> the case when there could be multiple tables. Patch
> introducing support for multiple tables would be massive
> change affected many places in llvm codebase.
>
> Another thing is that not only the llvm code base but all
> other DWARF consumers should be changed to support fragmented
> debug info.
>
> Shortly, if all debug tables would be fragmented then working
> with debug info would be significantly complicated.
>
> Thus the reasons to select #3 are :
>
> 1. It could be done in a single place, not affecting other
> parts of the llvm code base.
> 2. It does not require other DWARF consumers to implement
> support for it.
> 3. Avoiding fragmentation would save space.
> 4. Processing of not fragmented debug info is faster.
> 5. No need to adapt DWARF tables for fragmentation. They could
> be handled with their current state.
>
>
> Alexey
>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/f3c0c607/attachment.html>
More information about the llvm-dev
mailing list