[llvm-dev] Remove obsolete debug info while garbage collecting

Wed Sep 18 18:24:44 PDT 2019

On Wed, Sep 18, 2019 at 7:25 AM Alexey Lapshin <a.v.lapshin at mail.ru> wrote:

>
> 17.09.2019 3:12, David Blaikie пишет:
>
>
>
> On Wed, Sep 11, 2019 at 3:32 PM Alexey Lapshin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Debuginfo and linker folks, we (AccessSoftek) would like to suggest a
>> proposal for removing obsolete debug info. If you find it useful we will be
>> happy to work on improving it. Thank you for any opinions and suggestions.
>>
>> Alexey.
>>
>>     Currently when the linker does garbage collection a lot of abandoned
>> debug info is left behind (see Appendix A for documentation). Besides
>> inflated debug info size, we ended up with overlapping address ranges and
>> no way to say valid vs garbage ranges. We propose removing debug info along
>> with removing code. This would reduce debug info size and make sure debug
>> info accuracy.
>>
>> There are several approaches which could be used to solve that problem:
>>
>> 1.  Require dwarf producers to generate fragmented debug data according
>> to DWARF5 specification: "E.3.3 Single-function-per-DWARF-compilation-unit"
>> page 388. That approach assumes fragmenting the whole debug info per
>> function basis and glue fragmented sections at the link time using section
>> groups.
>>
>> 2.  Use an additional tool, which would optimize out unnecessary debug
>> data, something similar to dwz (dwarf compressor tool), dsymutil (links the
>> DWARF debug information). This approach assumes additional post-link
>> binaries processing.
>>
>> 3.  Teach the linker to parse debug data and let it remove unused debug
>> data.
>>
>> In this proposal, we focus on approach #3. We show that this approach is
>> viable and discuss some preliminary results, leaving particular
>> implementation out of the scope. We attach the Proof of Concept (PoC)
>> implementation(https://reviews.llvm.org/D67469) for illustrative
>> purposes. Please keep in mind that it is not final, and there is room for
>> improvements (see Appendix B). However, the achieved results look quite
>> promising and demonstrate up to 2 times size reduction and performance
>> overhead is 30% of linking time (which is in the same ballpark as the
>> already done section compressing (see table 2 point F)).
>>
>
> Have you considered/tried reusing the DWARF
> minimization/deduplication/linking logic that's already in llvm's dsymutil
> implementation? If we're going to do that having a singular implementation
> would be desirable.
>
> (bonus points if we could do something like the dsymutil approach when
> using Split DWARF and building a DWP - taking some address table output
> from the linker, and using that to help trim things (or, even when having
> no input from the linker - at least doing more aggressive deduplication
> during DWP construction than can be currently done with only type units (&
> potentially removing/avoiding type unit overhead too))
>
>
> Generally speaking, dsymutil does a very similar thing. It parses DWARF
> DIEs, analyzes relocations, scans through references and throws out unused
> DIEs. But it`s current interface does not allow to use it at link stage.
>  I think it would be perfect to have a singular implementation.
>  Though I did not analyze how easy or is it possible to reuse its code at
> the link stage, it looked like it needs a significant rework.
>
>  Implementation from this proposal does removing of obsolete debug info at
> link stage.
>  And so has benefits of already loaded object files, already created
> liveness information,
>  generating an optimized binary from scratch.
>
>
> If dsymutil could be refactored in such manner that could be used at the
> link stage, then it`s implementation could be reused. I would research the
> possibility of such a refactoring.
>
Yeah, if this is going to be implemented, I think that would be strongly
preferred - though I realize it may be substantial work to refactor. The
alternative - duplicating all this work - doesn't seem like something that
would be good for the LLVM project.

> 1. Minimize or entirely avoid references from subprograms into other parts
>> of .debug_info section. That would simplify splitting and removing
>> subprograms out in that sense that it would minimize the number of
>> references that should be parsed and followed. (DW_FORM_ref_subroutine
>> instead of DW_FORM_ref_*, ?)
>>
>
> Not sure I follow - by "other parts of the .debug_info section" do you
> mean in the same CU, or cross CU references? Any particular references you
> have in mind? Or encountered in practice?
>
> I mean here all kinds of references into .debug_info section.
>

Ah, not only references from other places /into/ .debug_info (which don't
really exist, so far as I know) but any references to locations within
debug_info.

Reducing these isn't super-viable - types being the most common examples.
Though now I understand what you're getting at partly around the
debug_type_table idea - adding a level of indirection to type references.
So it'd be easy to find only one place to fix when removing chunks of
debug_info (updating only the type table without having to find all the
places inside debug_info to touch). That indirection would come at a size
cost, of course - and an overhead for DWARF parsers having to follow that
indirection. Doesn't make it impossible - just tradeoffs to be aware of.

Though that's not the only DIE references - without removing them all
there'd still be a fair bit of overhead for finding any remaining ones and
applying them. If an indirection table is to be added, maybe a generalized
one (for any DIE reference) rather than one only for types would be good.

(aspects of this have been discusesd before - we've sometimes nicknamed it
"bag of DWARF" when discussing it in the context of type units (currently
you can only reference the type DIE in a type unit - which adds overhead
when wanting to reference subprogram declaration DIEs, etc (or maybe
multiple types are clustered together and don't need a separate type unit
each - if only you could refer to multiple types in a type unit) - so we've
discussed generalizing the type unit header (actually it could generalize
even as far as the classic CU header) to have N type DIE offset+hash pairs
(zero for a normal CU, one for a classic type unit, and any number for more
interesting cases))

> Going through references is the time-consuming task.
> Thus the fewer references there should be followed then the faster it
> works.
>
> For the cross CU references - It requires to load referenced CU. I do not
> know use cases where cross CU references are used.
>

Cross-CU inlining due to LTO. Try something like this:

a.cpp:
  void f2();
  __attribute__((always_inline)) void f1() {
    f2();
  }

b.cpp:
  void f1();
  int main() {
    f1();
  }

$ clang++ a.cpp b.cpp -emit-llvm -S -c -g
$ llvm-link a.ll b.ll -o ab.bc
$ clang++ ab.bc -c
$ llvm-dwarfdump ab.o -v -debug-info |
0x0b: DW_TAG_compile_unit
        DW_AT_name "a.cpp"
0x2a:   DW_TAG_subprogram
          DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x0056 => {0x00000056}
"_Z2f1v")
        DW_TAG_subprogram
          DW_AT_name "f1"
0x6e: DW_TAG_compile_unit
        DW_AT_name "b.cpp"
0x8d:   DW_TAG_subprogram
          DW_AT_name "main"
0xa6:     DW_TAG_inlined_subroutine
            DW_AT_abstract_origin [DW_FORM_ref_addr] (0x0000000000000056
"_Z2f1v")

ueaueoa
ueaoueoa

Notice that the inlined_subroutine's abstract_origin uses a linker
relocation into the debug_info section to give an absolute offset within
the finally linked debug_info section (since the debugger wouldn't know
that these two compile_units are bound together and to use some particular
compile_unit as the base offset - either it's absolute across the whole
debug_info section (FORM_ref_addr) or it's local to the CU (FORM_refN (such
as FORM_ref4 above)))

> If that is the specific case and is not used inside subprograms usually,
> then probably it is possible to avoid it.
>

It's fairly specifically used inside subprograms (& would need to be
adjusted even if it wasn't inside a subprogram - when bytes are removed,
etc) - though possibly general relocation handling in the linker could be
used to implement handling ref_addr.

> For the same CU - there could probably be cases when references could be
> ignored: https://reviews.llvm.org/P8165
>

How would references be ignored while keeping them correct? Ah, by making
subprograms more self-contained - maybe, but the work to figure out which
things are only referenced from one place and structure the DWARF
differently probably wouldn't be ideal in the compiler & wouldn't save the
debug info linker from having to haev code to handle the case where it
wasn't only used from that subprogram anyway.

>
>
>> 2. Create additional section - global types table (.debug_types_table).
>> That would significantly reduce the number of references inside .debug_info
>> section. It also makes it possible to have a 4-byte reference in this
>> section instead of 8-bytes reference into type unit (DW_FORM_ref_types
>> instead of DW_FORM_ref_sig8). It also makes it possible to place base types
>> into this section and avoid per-compile unit duplication of them.
>> Additionally, there could be achieved size reduction by not generating type
>> unit header. Note, that new section - .debug_types_table - differs from
>> DWARF4 section .debug_types in that sense that: it contains unique type
>> descriptors referenced by offsets instead of list of type units referenced
>> by DW_FORM_ref_sig8;  all table entries share the same abbreviations and do
>> not have type unit headers.
>>
>
> What do you mean when you say "global types table" the phrasing in the
> above paragraph is present-tense, as though this thing exists but doesn't
> seem to describe what it actually is and how it achieves the things the
> text says it achieves. Perhaps I've missed some context here.
>
>
> The "global types table" does not exist yet. It could be created if the
> discussed approach would be considered useful.
>

Ah, the present-tense language was a bit confusing for me when discussing a
thing that doesn't exist yet & not having provided a description of what it
might be or might contain and why it would exist/what it would achieve.

> Please check the comparison of possible "global types table" and currently
> existed type units: https://reviews.llvm.org/P8164
>
Ah, that proposed version makes it easy to remove subprograms from
debug_info without having to fix up type references (but you still have to
have the code to fix up other cross-CU references, like abstract_origin, so
I'm not sure it provides that much value) but doesn't make it easy to
remove types (becaues you'd have to go looking through the debug_info
section to update all the type offsets (which I guess you have to do anyway
to find the type references)  and removing the types still also requires
fixing up the types that reference each other...

So I'm not seeing a big win there.

> The benefit of using "global types table" is that it saves the space
> required to keep types comparing with type units solution.
>
>
>
>
>> 3. Define the limited scope for line programs which could be removed
>> independently. I.e. currently .debug_line section contains a program in
>> byte-coded language for a state machine. That program actually represents a
>> matrix [instruction][line information]. In general, it is hard to cut out
>> part of that program and to keep the whole program correct. Thus it would
>> be good to specify separate scopes (related to address ranges) which could
>> be easily removed from the program body.
>>
>
> In my experience line tables are /tiny/ - have you prototyped any change
> in this space to have a sense of whether it would have significant savings?
> (it'd potentially help address the address ambiguity issues when the linker
> discards code, though - so might be a correctness issue rather than a size
> performance issue)
>
> I did not measure the value of size reduction for line table, though I
> think that it would be a small value.
> The more important thing is a correctness issue. Line table could contain
> information for overlapping address ranges.
>
> There is another attempt to fix that issue -
> https://reviews.llvm.org/D59553.
>
Yep. It's a complicated problem, and fixing the line table would be a good
way to deal with some of it. (Split DWARF makes it hard to fix up the rest
of the debug info, though - so there would still be some ambiguity in the
DWARF with a binary using Split DWARF).

>
>
>
>>
>> We evaluated the approach on LLVM and Clang codebases. The results
>> obtained are summarized in the tables below:
>>
>
> Memory usage statistics (& confidence intervals for the build time) would
> probably be especially useful for comparing these tradeoffs.
> Doubly so when using compression (since the decompression would need to
> use more memory, as would the recompression - so, two different tradeoffs
> (compressed input, compressed output, and then both at the same time))
>
> I would measure memory impact for that PoC implementation, but I expect it
> would be significant.
> Memory usage was not optimized yet. There are several things which might
> be done to reduce memory footprint:
> do not load all compile units into memory, avoid adding Parent field to
> all DIEs.
>
Yep, this is the sort of thing where I suspect the dsymutil implementation
may've already had at least some of that work done - or, if not, that doing
the work once for both/all implementations would be very preferable to
duplicating the effort.

- Dave

> Alexey.
>
>
>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190918/a87ed0ac/attachment.html>