[llvm-dev] Remove obsolete debug info while garbage collecting

Mon Sep 23 17:05:18 PDT 2019

On Fri, Sep 20, 2019 at 1:41 PM Alexey Lapshin <a.v.lapshin at mail.ru> wrote:

>
> 19.09.2019 4:24, David Blaikie пишет:
>
>
>
> On Wed, Sep 18, 2019 at 7:25 AM Alexey Lapshin <a.v.lapshin at mail.ru>
> wrote:
>
>>
>>
>>
>>
>> Generally speaking, dsymutil does a very similar thing. It parses DWARF
>> DIEs, analyzes relocations, scans through references and throws out unused
>> DIEs. But it`s current interface does not allow to use it at link stage.
>>  I think it would be perfect to have a singular implementation.
>>  Though I did not analyze how easy or is it possible to reuse its code at
>> the link stage, it looked like it needs a significant rework.
>>
>>  Implementation from this proposal does removing of obsolete debug info
>> at link stage.
>>  And so has benefits of already loaded object files, already created
>> liveness information,
>>  generating an optimized binary from scratch.
>>
>>
>> If dsymutil could be refactored in such manner that could be used at the
>> link stage, then it`s implementation could be reused. I would research the
>> possibility of such a refactoring.
>>
> Yeah, if this is going to be implemented, I think that would be strongly
> preferred - though I realize it may be substantial work to refactor. The
> alternative - duplicating all this work - doesn't seem like something that
> would be good for the LLVM project.
>
> I see. So I would research the question of whether it is possible to
> refactor it accordingly.
>
>
> 1. Minimize or entirely avoid references from subprograms into other parts
>>> of .debug_info section. That would simplify splitting and removing
>>> subprograms out in that sense that it would minimize the number of
>>> references that should be parsed and followed. (DW_FORM_ref_subroutine
>>> instead of DW_FORM_ref_*, ?)
>>>
>>
>> Not sure I follow - by "other parts of the .debug_info section" do you
>> mean in the same CU, or cross CU references? Any particular references you
>> have in mind? Or encountered in practice?
>>
>> I mean here all kinds of references into .debug_info section.
>>
>
> Ah, not only references from other places /into/ .debug_info (which don't
> really exist, so far as I know) but any references to locations within
> debug_info.
>
> Reducing these isn't super-viable - types being the most common examples.
> Though now I understand what you're getting at partly around the
> debug_type_table idea - adding a level of indirection to type references.
> So it'd be easy to find only one place to fix when removing chunks of
> debug_info (updating only the type table without having to find all the
> places inside debug_info to touch). That indirection would come at a size
> cost, of course - and an overhead for DWARF parsers having to follow that
> indirection. Doesn't make it impossible - just tradeoffs to be aware of.
>
> Though that's not the only DIE references - without removing them all
> there'd still be a fair bit of overhead for finding any remaining ones and
> applying them. If an indirection table is to be added, maybe a generalized
> one (for any DIE reference) rather than one only for types would be good.
>
> yes, some general indirection table would probably be useful.
> But, types would still require specialized handling.
> Types have "type hash" and need some specific logic around that.
>
This indirection is essentially the same as relocations & could be
implemented that way (though no matter the solution you'd need some
attribute on the CU that says "I don't use any CU-local DIE offsets" so an
implementation didn't have to go searching/scanning for such offsets
(though I guess it'd be cheap to scan for that by just looking at the
abbreviations & if you don't see any CU-local DIE offset forms, use the
fast-path)). A custom DWARF format would be potentially more compact than
general ELF relocations.

>
> (aspects of this have been discusesd before - we've sometimes nicknamed it
> "bag of DWARF" when discussing it in the context of type units (currently
> you can only reference the type DIE in a type unit - which adds overhead
> when wanting to reference subprogram declaration DIEs, etc (or maybe
> multiple types are clustered together and don't need a separate type unit
> each - if only you could refer to multiple types in a type unit) - so we've
> discussed generalizing the type unit header (actually it could generalize
> even as far as the classic CU header) to have N type DIE offset+hash pairs
> (zero for a normal CU, one for a classic type unit, and any number for more
> interesting cases))
>
> As far as I understand, "generalizing the type unit header (actually it
> could generalize even as far as the classic CU header) to have N type DIE
> offset+hash pairs" looks very close to "global type table" which I am
> talking about.
>
>
>
>
>> Going through references is the time-consuming task.
>> Thus the fewer references there should be followed then the faster it
>> works.
>>
>> For the cross CU references - It requires to load referenced CU. I do not
>> know use cases where cross CU references are used.
>>
>
> Cross-CU inlining due to LTO. Try something like this:
>
> a.cpp:
>   void f2();
>   __attribute__((always_inline)) void f1() {
>     f2();
>   }
>
> b.cpp:
>   void f1();
>   int main() {
>     f1();
>   }
>
> $ clang++ a.cpp b.cpp -emit-llvm -S -c -g
> $ llvm-link a.ll b.ll -o ab.bc
> $ clang++ ab.bc -c
> $ llvm-dwarfdump ab.o -v -debug-info |
> 0x0b: DW_TAG_compile_unit
>         DW_AT_name "a.cpp"
> 0x2a:   DW_TAG_subprogram
>           DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x0056 =>
> {0x00000056} "_Z2f1v")
>         DW_TAG_subprogram
>           DW_AT_name "f1"
> 0x6e: DW_TAG_compile_unit
>         DW_AT_name "b.cpp"
> 0x8d:   DW_TAG_subprogram
>           DW_AT_name "main"
> 0xa6:     DW_TAG_inlined_subroutine
>             DW_AT_abstract_origin [DW_FORM_ref_addr] (0x0000000000000056
> "_Z2f1v")
>
> ueaueoa
> ueaoueoa
>
> Notice that the inlined_subroutine's abstract_origin uses a linker
> relocation into the debug_info section to give an absolute offset within
> the finally linked debug_info section (since the debugger wouldn't know
> that these two compile_units are bound together and to use some particular
> compile_unit as the base offset - either it's absolute across the whole
> debug_info section (FORM_ref_addr) or it's local to the CU (FORM_refN (such
> as FORM_ref4 above)))
>
> Got it. Thank you.
>
>
>
>
>> If that is the specific case and is not used inside subprograms usually,
>> then probably it is possible to avoid it.
>>
>
> It's fairly specifically used inside subprograms (& would need to be
> adjusted even if it wasn't inside a subprogram - when bytes are removed,
> etc) - though possibly general relocation handling in the linker could be
> used to implement handling ref_addr.
>
>
>> For the same CU - there could probably be cases when references could be
>> ignored: https://reviews.llvm.org/P8165
>>
>
> How would references be ignored while keeping them correct? Ah, by making
> subprograms more self-contained - maybe, but the work to figure out which
> things are only referenced from one place and structure the DWARF
> differently probably wouldn't be ideal in the compiler & wouldn't save the
> debug info linker from having to haev code to handle the case where it
> wasn't only used from that subprogram anyway.
>
>
>>
>>
>>> 2. Create additional section - global types table (.debug_types_table).
>>> That would significantly reduce the number of references inside .debug_info
>>> section. It also makes it possible to have a 4-byte reference in this
>>> section instead of 8-bytes reference into type unit (DW_FORM_ref_types
>>> instead of DW_FORM_ref_sig8). It also makes it possible to place base types
>>> into this section and avoid per-compile unit duplication of them.
>>> Additionally, there could be achieved size reduction by not generating type
>>> unit header. Note, that new section - .debug_types_table - differs from
>>> DWARF4 section .debug_types in that sense that: it contains unique type
>>> descriptors referenced by offsets instead of list of type units referenced
>>> by DW_FORM_ref_sig8;  all table entries share the same abbreviations and do
>>> not have type unit headers.
>>>
>>
>> What do you mean when you say "global types table" the phrasing in the
>> above paragraph is present-tense, as though this thing exists but doesn't
>> seem to describe what it actually is and how it achieves the things the
>> text says it achieves. Perhaps I've missed some context here.
>>
>>
>> The "global types table" does not exist yet. It could be created if the
>> discussed approach would be considered useful.
>>
>
> Ah, the present-tense language was a bit confusing for me when discussing
> a thing that doesn't exist yet & not having provided a description of what
> it might be or might contain and why it would exist/what it would achieve.
>
> I should've written it more precise.
>
>
>
>
>> Please check the comparison of possible "global types table" and
>> currently existed type units: https://reviews.llvm.org/P8164
>>
> Ah, that proposed version makes it easy to remove subprograms from
> debug_info without having to fix up type references (but you still have to
> have the code to fix up other cross-CU references, like abstract_origin, so
> I'm not sure it provides that much value) but doesn't make it easy to
> remove types (becaues you'd have to go looking through the debug_info
> section to update all the type offsets (which I guess you have to do anyway
> to find the type references)  and removing the types still also requires
> fixing up the types that reference each other...
>
> So I'm not seeing a big win there.
>
> Correct. Even if types were put into a separated table, there still would
> be necessary to:
>  "go looking through the debug_info section to update all the type
> offsets";
>  "removing the types still also requires fixing up the types that
> reference each other".
>
>  But additionally it allows to have following benefits:
>
>  1. Size reduction by remove fragmentation. In "-fdebug-types-section"
> solution every type which is put  into type unit requires:
>    - additional type unit header,
>    - section header(since it put into separate section),
>    - proxy type copies inside compilation unit.
>
>   Putting types into separate table allows not to create above data for
> every type.
>
> 2. Size reduction by deduplicate base types. In "-fdebug-types-section"
> solution base types are not deduplicated at all.
>

Base types are pretty small - not sure there'd be much to save by
indirection (for classic base types like "int" - for non-trivial but
non-user-defined types like subroutine types there might be more
opportunity for savings). & you'd still have some cost of indirection to
tradeoff - so I don't think it's always going to be the right solution to
indirect everything.

There's a lot of design considerations in this problem space, let's put it
that way.

> 3. Performance improvement by handling fewer data. #1 leads to loading and
> parsing fewer bits.
>
> 4. Performance improvement by handling fewer references. Simpler reference
> chains allow parsing references faster.
>   Instead of this :
>
>   type_offset->proxy_type->DW_FORM_ref_sig8->type_unit->type_offset->type.
>
>   There would be this :
>
>   type_offset->type_table->type.
>
Yep, though to avoid the need for the proxy type you'd need to be able to
refer to other entities in the "bag of DWARF"/generalized type unit (things
like member function declarations and the like)

Yes, "bag of DWARF" or generalized type units (where you can refer to
multiple entities in a single unit by some kind of hash) has some benefits.

But it seems somewhat orthogonal to your debug info linking goals here,
unless it is a solution that removes the need for parsing the DWARF.

Another way to consider this would be to model (or actually implement)
inter-DIE references as relocations (DW_FORM_sec_offset instead of a cu
offset) - ah, I mentioned that earlier (I'm writing this reply out of
order).

>
>
>
>>> We evaluated the approach on LLVM and Clang codebases. The results
>>> obtained are summarized in the tables below:
>>>
>>
>> Memory usage statistics (& confidence intervals for the build time) would
>> probably be especially useful for comparing these tradeoffs.
>> Doubly so when using compression (since the decompression would need to
>> use more memory, as would the recompression - so, two different tradeoffs
>> (compressed input, compressed output, and then both at the same time))
>>
>> I would measure memory impact for that PoC implementation, but I expect
>> it would be significant.
>> Memory usage was not optimized yet. There are several things which might
>> be done to reduce memory footprint:
>> do not load all compile units into memory, avoid adding Parent field to
>> all DIEs.
>>
> Yep, this is the sort of thing where I suspect the dsymutil implementation
> may've already had at least some of that work done - or, if not, that doing
> the work once for both/all implementations would be very preferable to
> duplicating the effort.
>
> Ok,
>
>
> Thank you, Alexey.
>
>
>
> - Dave
>
>> Alexey.
>>
>>
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190923/80a50a6e/attachment.html>