[llvm-dev] Remove obsolete debug info while garbage collecting

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 24 10:44:57 PDT 2019


24.09.2019 3:05, David Blaikie пишет:
> On Fri, Sep 20, 2019 at 1:41 PM Alexey Lapshin <a.v.lapshin at mail.ru 
> <mailto:a.v.lapshin at mail.ru>> wrote:
>
>     19.09.2019 4:24, David Blaikie пишет:
>>     On Wed, Sep 18, 2019 at 7:25 AM Alexey Lapshin
>>     <a.v.lapshin at mail.ru <mailto:a.v.lapshin at mail.ru>> wrote:
>>
>>>             1. Minimize or entirely avoid references from
>>>             subprograms into other parts of .debug_info section.
>>>             That would simplify splitting and removing subprograms
>>>             out in that sense that it would minimize the number of
>>>             references that should be parsed and followed.
>>>             (DW_FORM_ref_subroutine instead of DW_FORM_ref_*, ?)
>>>
>>>
>>>         Not sure I follow - by "other parts of the .debug_info
>>>         section" do you mean in the same CU, or cross CU references?
>>>         Any particular references you have in mind? Or encountered
>>>         in practice?
>>         I mean here all kinds of references into .debug_info section.
>>
>>
>>     Ah, not only references from other places /into/ .debug_info
>>     (which don't really exist, so far as I know) but any references
>>     to locations within debug_info.
>>
>>     Reducing these isn't super-viable - types being the most common
>>     examples. Though now I understand what you're getting at partly
>>     around the debug_type_table idea - adding a level of indirection
>>     to type references. So it'd be easy to find only one place to fix
>>     when removing chunks of debug_info (updating only the type table
>>     without having to find all the places inside debug_info to
>>     touch). That indirection would come at a size cost, of course -
>>     and an overhead for DWARF parsers having to follow that
>>     indirection. Doesn't make it impossible - just tradeoffs to be
>>     aware of.
>>
>>     Though that's not the only DIE references - without removing them
>>     all there'd still be a fair bit of overhead for finding any
>>     remaining ones and applying them. If an indirection table is to
>>     be added, maybe a generalized one (for any DIE reference) rather
>>     than one only for types would be good.
>>
>     yes, some general indirection table would probably be useful.
>     But, types would still require specialized handling.
>     Types have "type hash" and need some specific logic around that.
>
> This indirection is essentially the same as relocations & could be 
> implemented that way (though no matter the solution you'd need some 
> attribute on the CU that says "I don't use any CU-local DIE offsets" 
> so an implementation didn't have to go searching/scanning for such 
> offsets (though I guess it'd be cheap to scan for that by just looking 
> at the abbreviations & if you don't see any CU-local DIE offset forms, 
> use the fast-path)). A custom DWARF format would be potentially more 
> compact than general ELF relocations.

I see, so indirection table(or just relocations) will speedup references 
patching. There would not be necessary to parse DWARF to find all 
references which should be corrected. They already would be gathered in 
the "indirection table"(or relocations table) and as the result patching 
process would be executed faster.

But that solution has a cost. You've already mentioned it. Size of debug 
info would be increased.

My original suggestion was to evaluate variant with a minimal size of 
debug info.
"Types table"/"bag of DWARF" allows us to have minimal size by 
deduplicating base/proxy types and avoiding fragmentation.
And if performance would be insufficient, then speed up it.
Indirection table is an option which would allow having that speedup.


>>>             2. Create additional section - global types table
>>>             (.debug_types_table). That would significantly reduce
>>>             the number of references inside .debug_info section. It
>>>             also makes it possible to have a 4-byte reference in
>>>             this section instead of 8-bytes reference into type unit
>>>             (DW_FORM_ref_types instead of DW_FORM_ref_sig8). It also
>>>             makes it possible to place base types into this section
>>>             and avoid per-compile unit duplication of them.
>>>             Additionally, there could be achieved size reduction by
>>>             not generating type unit header. Note, that new section
>>>             - .debug_types_table - differs from DWARF4 section
>>>             .debug_types in that sense that: it contains unique type
>>>             descriptors referenced by offsets instead of list of
>>>             type units referenced by DW_FORM_ref_sig8;  all table
>>>             entries share the same abbreviations and do not have
>>>             type unit headers.
>>>
>>>
>>>         What do you mean when you say "global types table" the
>>>         phrasing in the above paragraph is present-tense, as though
>>>         this thing exists but doesn't seem to describe what it
>>>         actually is and how it achieves the things the text says it
>>>         achieves. Perhaps I've missed some context here.
>>
>>
>>         The "global types table" does not exist yet. It could be
>>         created if the discussed approach would be considered useful.
>>
>>
>>     Ah, the present-tense language was a bit confusing for me when
>>     discussing a thing that doesn't exist yet & not having provided a
>>     description of what it might be or might contain and why it would
>>     exist/what it would achieve.
>
>     I should've written it more precise.
>
>
>>         Please check the comparison of possible "global types table"
>>         and currently existed type units: https://reviews.llvm.org/P8164
>>
>>     Ah, that proposed version makes it easy to remove subprograms
>>     from debug_info without having to fix up type references (but you
>>     still have to have the code to fix up other cross-CU references,
>>     like abstract_origin, so I'm not sure it provides that much
>>     value) but doesn't make it easy to remove types (becaues you'd
>>     have to go looking through the debug_info section to update all
>>     the type offsets (which I guess you have to do anyway to find the
>>     type references)  and removing the types still also requires
>>     fixing up the types that reference each other...
>>
>>     So I'm not seeing a big win there.
>
>     Correct. Even if types were put into a separated table, there
>     still would be necessary to:
>      "go looking through the debug_info section to update all the type
>     offsets";
>      "removing the types still also requires fixing up the types that
>     reference each other".
>
>      But additionally it allows to have following benefits:
>
>      1. Size reduction by remove fragmentation. In
>     "-fdebug-types-section" solution every type which is put  into
>     type unit requires:
>        - additional type unit header,
>        - section header(since it put into separate section),
>        - proxy type copies inside compilation unit.
>
>       Putting types into separate table allows not to create above
>     data for every type.
>
>     2. Size reduction by deduplicate base types. In
>     "-fdebug-types-section" solution base types are not deduplicated
>     at all.
>
>
> Base types are pretty small - not sure there'd be much to save by 
> indirection (for classic base types like "int" - for non-trivial but 
> non-user-defined types like subroutine types there might be more 
> opportunity for savings). & you'd still have some cost of indirection 
> to tradeoff - so I don't think it's always going to be the right 
> solution to indirect everything.
>
> There's a lot of design considerations in this problem space, let's 
> put it that way.

For the clang binary they(base/proxy types) take ~1.5% of overall 
.debug_info + .debug_types. Another ~1.5% takes fragmentation from #1.

Implementing "bag of DWARF"/"generalized type unit"/"types table" allows 
to deduplicate base/proxy types and avoid fragmentation.
It could give approx 3% of debug info for either reducing debug info 
size either creating "indirection table" accelerator.

My idea is to start from the minimum size of debug info and to check 
whether parsing performance would be enough. The PoC implementation for 
that proposal does all kind of things: parses abbreviations, removes 
parts of debug_info, searches for references which should be patched, 
patch references. Its performance looks quite good.


>     3. Performance improvement by handling fewer data. #1 leads to
>     loading and parsing fewer bits.
>
>     4. Performance improvement by handling fewer references. Simpler
>     reference chains allow parsing references faster.
>       Instead of this :
>
>     type_offset->proxy_type->DW_FORM_ref_sig8->type_unit->type_offset->type.
>
>       There would be this :
>
>       type_offset->type_table->type.
>
> Yep, though to avoid the need for the proxy type you'd need to be able 
> to refer to other entities in the "bag of DWARF"/generalized type unit 
> (things like member function declarations and the like)
>
> Yes, "bag of DWARF" or generalized type units (where you can refer to 
> multiple entities in a single unit by some kind of hash) has some 
> benefits.
>
> But it seems somewhat orthogonal to your debug info linking goals 
> here, unless it is a solution that removes the need for parsing the DWARF.
>
Minimizing of debug_info size is also a goal If the performance of 
parsing DWARF would be acceptable.


> Another way to consider this would be to model (or actually implement) 
> inter-DIE references as relocations (DW_FORM_sec_offset instead of a 
> cu offset) - ah, I mentioned that earlier (I'm writing this reply out 
> of order).

Agreed.


Alexey

>>>
>>>             _______________________________________________
>>>             LLVM Developers mailing list
>>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190924/b13dc4d8/attachment.html>


More information about the llvm-dev mailing list