[PATCH] D74169: [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.

Fri Apr 3 13:34:00 PDT 2020

clayborg added a comment.

In D74169#1960444 <https://reviews.llvm.org/D74169#1960444>, @avl wrote:

> > Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.
>
> It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

.debug_types could be much more efficient if we stop duplicating the decl context just so we can refer to a type. Right now if you have a variable, it will point to a CU local type, which duplicates the containing namespaces and class/struct/union decl contexts. So the overhead of .debug_types is a bit higher. It would be great if we could just refer directly to the type from .debug_types in the DW_TAG_variable and other DIEs that have DW_AT_type attributes.

Also, .debug_types is only there for struct, unions, classes and a few other types. When you have a typedef like "std::string::size_type", or a typedef inside of the std::string, it may or may not appear in the .debug_types definition, but no one can point to those types because the way .debug_types is currently defined, only one type can be referred to within a type  unit. The results is that any code wanting to use of one these types must duplicate the entire decl context (namespace "std", class "basic_string<>", typedef "size_type") just so a variable can point to the DWARF.

I mention this because I think letting --gc-debuginfo do its thing with ODR type uniquing leaves the DWARF in a better state: type info for one class isn't strewn across a bunch of .o files, or having large swaths of the type duplicated all over the place. So if we have type uniquing, I would keep .debug_types out of it.

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

> DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

Are object files not actually bigger when using .debug_types? All the duplicated decl context junk in the DWARF is not efficient. I would be surprised if type units make .o files smaller. There is nothing to unique within a single .o file and the type units should only make things bigger.

> I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html.  "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Right now the ODR type uniquing does this, but it doesn't split the types out into their own compile units. The first time you see a type, you emit it and remember it. All other uses of this type and up using a DW_FORM_ref_addr which is 32 bits to refer to the first emitted type. So the ODR already does this. What it doesn't do is split types out into unique compile units based on their DW_AT_decl_file. If we did, you can slowly combine all DWARF that was defined in header files into DW_TAG_compile_unit DIEs with only types from those files. Then the DWARF would be really clean. The main issue with this approach is that it means you have to make all DWARF in memory to that things can lazily be added to each of these bags (compile units with types in them) since you might end up adding more types to each compile unit. With the current ODR support we can parse the .o files one at a time, and refer to things we have already emitted easily and we don't need to keep everything around so we can add things to the "bags".

The downside of the dsymutil ODR technique, is many times you types that have implicitly added methods (default constructor, assignment operators, move constructors) that are only present if the user used them in their source file. So you might end up with many definitions of "class A", but some have the copy constructor in the DWARF (it was created by the compiler for types where the compiler can create them) and others that don't. So we sometimes have to emit the type multiple times when you have a DW_TAG_subprogram that refers to these implicitly created functions. But overall the approach works well for how easy it is to implement.

I am all for creating compile units and adding all of the types declared in that file to them and having the DWARF be even cleaner though!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74169/new/

https://reviews.llvm.org/D74169