[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Thu Jan 6 14:10:17 PST 2022

avl added a comment.

> I don't understand the second part "Types distributed between units should also be duplicated" - types can refer to types in other units (using DW_FORM_sec_offset the same as the way that types will be referenced from the non-type contexts where the type needs to be referred to (eg: a compilation unit with a subprogram that needs to refer to the return type in the global type table/type unit/whatever)), I wouldn't advocate for any duplication.

The case with parametrized function member does not have a good solution if separated between compilation units.
It either requires to duplicate part of the description(that is a solution which I was talking about trying to avoid inter-CU references),
either introduces inter-CU reference(that is solution which you are talking about(DW_FORM_sec_offset/DW_FORM_ref_addr)).
I think it would be good if we will use a solution that does not require duplicating data AND does not 
require using inter-CU reference(other that non-typeCU->typeCU):

cat main.cpp

  char foo1 (void);
  long foo2 (void);

  int main ( void ) {
    return foo1() + foo2();
  }

cat a.h

  struct A {
    template <class T>
    T foo () {
      return T();
    }

    int createi () {
      return 0;
     }

    float createf ( ) {
      return 0.0;
    }
  };

cat foo1.cpp

  #include "a.h"

  char foo1 (void) {
    return A().foo<char>(); 
  }

cat foo2.cpp

  #include "a.h"

  namespace name1 {

  struct B {
    long m1;
  };

  B internal (void) {
    return A().foo<B>();
  }

  }

  long foo2 (void ) {
    return name1::internal().m1;
  }

  int foo3 () {

   A var;

   return var.createi() + var.createf();
  }

If struct "A" and struct B are placed into the different compilation units, then we have two alternatives on how to represent struct "A". 
First alternative assumes duplication of struct "A" definition while second alternative assumes inter-CU reference:

1. The first compilation unit("foo1.cpp") does not have a definition for "foo<name1::B>". The definition for "foo<name1::B>" is located into the second compilation unit("foo2.cpp"). In this case we have a duplicated definition of struct "A".

  DW_TAG_compile_unit
    DW_AT_name "foo1.cpp"

    DW_TAG_structure_type   <<<<<
      DW_AT_name "A"        <<<<<

      DW_TAG_subprogram
        DW_AT_name "foo<char>"

      DW_TAG_subprogram
        DW_AT_name "createi"

      DW_TAG_subprogram
        DW_AT_name "createf"

    NULL

  DW_TAG_compile_unit
    DW_AT_name "foo2.cpp"

    DW_TAG_structure_type    <<<<<<
      DW_AT_name "A"         <<<<<<

      DW_TAG_subprogram
        DW_AT_name "foo<name1::B>"

          DW_TAG_template_type_parameter
            DW_AT_type "name1::B"

    DW_TAG_namespace
      DW_AT_name "name1"

    DW_TAG_structure_type
      DW_AT_name "B"

    NULL

2. All definitions are inside the first unit("foo1.cpp"), but we have a cross CU reference from the first unit to the second.

  DW_TAG_compile_unit
    DW_AT_name "foo1.cpp"

    DW_TAG_structure_type
      DW_AT_name "A"

      DW_TAG_subprogram
        DW_AT_name "foo<char>"

      DW_TAG_subprogram
        DW_AT_name "createi"

      DW_TAG_subprogram
        DW_AT_name "createf"

      DW_TAG_subprogram
        DW_AT_name "foo<name1::B>"

          DW_TAG_template_type_parameter
            DW_AT_type "name1::B"   <<<<<<

    NULL

  DW_TAG_compile_unit
    DW_AT_name "foo2.cpp"

    DW_TAG_namespace
      DW_AT_name "name1"

    DW_TAG_structure_type   <<<<<<
      DW_AT_name "B"

    NULL

The solution from this patch solves both problems by putting all data in a single compilation unit.
Thus we do not need to duplicate data and create only non-typeCU->typeCU inter-CU references. 
If we want to split "type table" into a set of smaller units to minimize peak memory usage then we probably could 
group types so that all dependent types would be located in a single unit.
That way we probably could avoid duplication and inter-CU references.
Avoiding inter-CU references might help to create more effective implementation of an DWARF reader.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035