[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Mon Jan 10 13:00:50 PST 2022

dblaikie added a comment.

In D96035#3226125 <https://reviews.llvm.org/D96035#3226125>, @avl wrote:

>> I don't understand the second part "Types distributed between units should also be duplicated" - types can refer to types in other units (using DW_FORM_sec_offset the same as the way that types will be referenced from the non-type contexts where the type needs to be referred to (eg: a compilation unit with a subprogram that needs to refer to the return type in the global type table/type unit/whatever)), I wouldn't advocate for any duplication.
>
> The case with parametrized function member does not have a good solution if separated between compilation units.
> It either requires to duplicate part of the description(that is a solution which I was talking about trying to avoid inter-CU references),
> either introduces inter-CU reference(that is solution which you are talking about(DW_FORM_sec_offset/DW_FORM_ref_addr)).
> I think it would be good if we will use a solution that does not require duplicating data AND does not 
> require using inter-CU reference(other that non-typeCU->typeCU):
>
> cat main.cpp
>
>   char foo1 (void);
>   long foo2 (void);
>   
>   int main ( void ) {
>     return foo1() + foo2();
>   }
>
> cat a.h
>
>   struct A {
>     template <class T>
>     T foo () {
>       return T();
>     }
>   
>     int createi () {
>       return 0;
>      }
>   
>     float createf ( ) {
>       return 0.0;
>     }
>   };
>
> cat foo1.cpp
>
>   #include "a.h"
>   
>   char foo1 (void) {
>     return A().foo<char>(); 
>   }
>
> cat foo2.cpp
>
>   #include "a.h"
>   
>   namespace name1 {
>   
>   struct B {
>     long m1;
>   };
>   
>   B internal (void) {
>     return A().foo<B>();
>   }
>   
>   }
>   
>   long foo2 (void ) {
>     return name1::internal().m1;
>   }
>   
>   int foo3 () {
>   
>    A var;
>   
>    return var.createi() + var.createf();
>   }
>
> If struct "A" and struct B are placed into the different compilation units, then we have two alternatives on how to represent struct "A". 
> First alternative assumes duplication of struct "A" definition while second alternative assumes inter-CU reference:
>
> 1. The first compilation unit("foo1.cpp") does not have a definition for "foo<name1::B>". The definition for "foo<name1::B>" is located into the second compilation unit("foo2.cpp"). In this case we have a duplicated definition of struct "A".
>
>
>
>   DW_TAG_compile_unit
>     DW_AT_name "foo1.cpp"
>     
>     DW_TAG_structure_type   <<<<<
>       DW_AT_name "A"        <<<<<
>   
>       DW_TAG_subprogram
>         DW_AT_name "foo<char>"
>   
>       DW_TAG_subprogram
>         DW_AT_name "createi"
>   
>       DW_TAG_subprogram
>         DW_AT_name "createf"
>   
>     NULL
>     
>     
>   DW_TAG_compile_unit
>     DW_AT_name "foo2.cpp"
>     
>     DW_TAG_structure_type    <<<<<<
>       DW_AT_name "A"         <<<<<<
>       
>       DW_TAG_subprogram
>         DW_AT_name "foo<name1::B>"
>   
>           DW_TAG_template_type_parameter
>             DW_AT_type "name1::B"
>   
>     DW_TAG_namespace
>       DW_AT_name "name1"
>   
>     DW_TAG_structure_type
>       DW_AT_name "B"
>   
>     NULL
>
>
>
> 2. All definitions are inside the first unit("foo1.cpp"), but we have a cross CU reference from the first unit to the second.
>
>
>
>   DW_TAG_compile_unit
>     DW_AT_name "foo1.cpp"
>     
>     DW_TAG_structure_type
>       DW_AT_name "A"
>   
>       DW_TAG_subprogram
>         DW_AT_name "foo<char>"
>   
>       DW_TAG_subprogram
>         DW_AT_name "createi"
>   
>       DW_TAG_subprogram
>         DW_AT_name "createf"
>   
>       DW_TAG_subprogram
>         DW_AT_name "foo<name1::B>"
>   
>           DW_TAG_template_type_parameter
>             DW_AT_type "name1::B"   <<<<<<
>   
>     NULL
>     
>     
>   DW_TAG_compile_unit
>     DW_AT_name "foo2.cpp"
>     
>     DW_TAG_namespace
>       DW_AT_name "name1"
>   
>     DW_TAG_structure_type   <<<<<<
>       DW_AT_name "B"
>   
>     NULL
>
> The solution from this patch solves both problems by putting all data in a single compilation unit.
> Thus we do not need to duplicate data and create only non-typeCU->typeCU inter-CU references. 
> If we want to split "type table" into a set of smaller units to minimize peak memory usage then we probably could 
> group types so that all dependent types would be located in a single unit.
> That way we probably could avoid duplication and inter-CU references.
> Avoiding inter-CU references might help to create more effective implementation of an DWARF reader.

This doesn't address the case where struct `A` is a local type (in an anonymous namespace). Moving it out of the translation/compile unit it's defined it will break the representation (because the name is only valid within that compile unit - so you might end up pulling two types with the same name into the type table unit, making it hard for a DWARF consumer to do correct name lookup/know which version of the type the user is talking about in a given context).

What we could do is a third option - not duplicating the whole definition of A, but only part of the definition.

Take a look at the DWARF Clang produces for something like this, for instance:

  struct t1 {
    virtual void f1(); // home this type into another translation unit
    template<typename T>
    void f1() { }
  };
  namespace {
  struct t2 { };
  }
  int main() {
    t1().f1<t2>();
  }

The DWARF for this will include a declaration of `t1` (so, some duplication - but not all the members, etc) with a declaration of the `f1<t2>` member in that `t1` declaration (& then an out of line definition that refers to that declaration) (in this case we could potentially also include a `DW_AT_specification` in the declaration that points to the definition in the type table, if that helps consumers significantly - so they don't have to do name lookup to figure out that they're the same thing).

Speaking of all that - how are member function definitions working in this current proposal? Do they use sec_offsets to refer to the declaration in the type table, or do they already do something like ^ with a type declaration in the same unit to house the member function declaration?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035