[PATCH] D74169: [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.

Sat Apr 4 09:32:35 PDT 2020

avl added a comment.

> To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

> There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

> Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

Sorry. I need to stop writing letters in the late evening.
type units make object files bigger. Thus, type units could not be used for reducing object files.

Mainly I am talking about two things:

  1 -fdebug-types-section allows processing debug info faster.

dsymutil/DWARFLinker performance heavily depends on the size of input DWARF.
The more DWARF to parse, the more time the linking process would take.
Time to parse whole DWARF with ODR type deduplication(+gc-debuginfo +NoOdr=false)  for clang is:

296%(145s)

Resulting DWARF is accurate: no type duplications, no extra bytes needed for type units. But it's processing time: 145s.

Time to parse in case -fdebug-types-section for clang is:
(most of types are in .debug_types section and ignored currently, thus
only .debug_info/.debug_line/.debug_ranges/.debug_loc are parsed)

133%(65s)

This difference is because not necessary to parse type information. 
Comdat sections are quickly resolved by the linker based on section id.
Comparing hash id instead of analyzing context makes it work much faster.

So separating type info from the whole DWARF allows to noticeable speedup the debug info linking.
DWARFLinker will speedup debug info processing in the current scheme by supporting type units(-fdebug-types-section). 
It looks like, linking time would be close to 133%(65s).

  2 Something like "bag of dies" could avoid increasing size of data(created by type units).

This idea is not for the current state. This is a possible DWARF format change.

The linker processes comdat sections with type units quickly because it does not parse contents.
It uses the hash id to compare and either keep either decline the section. 
Adding hash id makes processing faster, but it requires type units.

To reduce size of type units it is possible to implement "bag of dies" idea.
Something like this https://reviews.llvm.org/P8164 (global type table/bag of dies - means something similar).
But how "bag of dies" from different object files would be merged ? When single type is put in single 
section - it could be merged per type basis. The similar approach for "bag of dies"
would introduce type duplication(the same types could be used in various "bag of dies").

DWARFLinker could be used for merging "bag of dies" instead of lld linker. 
It would merge incoming "bag of dies" into single output "bag of dies". 
To avoid heavy parsing and context analyzing DWARFLinker
could use solution similar to comdat solution - whether current die should be put into 
resulting "bag of dies" would be decided by compare hash id(DW_AT_signature).

It looks like this solution would allow having a minimal size and fast processing.

> Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

right, DWARF-aware linker doing de-duplication using ODR does not need "Bag of DWARF".
But doing de-duplication using ODR works slower than doing de-duplication for types marked with hash.
To add hash to the type it is necessary to put it into  "Bag of DWARF".

It seems that Bag of DWARF used with DWARF agnostic linker would have a problem of type duplication.
"Bag of DWARF" from several object files could have duplicated types.
I think DWARF-aware linker is necessary to deduplicate  "Bags of DWARF".

> We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

yes. that is exactly what I am talking about. using type hash with elements of that type unit(bag of dies) would allow to speedup types de-duplication.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74169/new/

https://reviews.llvm.org/D74169