[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Fri Feb 11 07:37:51 PST 2022

avl added a comment.

@clayborg   I`ve done some research and have a couple of questions. Would you mind looking at them, please?

1. First thing is that I tried to model the case with separate compilation units for each declaration file and it seems that the overall impact is about 0,78% of debug_info+debug_abbrev size of the clang binary. Size of debug_info+debug_abbrev size is 158.7MB. Additional size, required for multiple compilation units, is 1.23MB(2051*(100+500)). 2051 is the number of files used in decl_file attributes, 100 bytes is the size of the compilation unit header, compilation unit die, size of namespace dies, line table header, 500 is the size of separate abbreviation table.

2. Second thing is that I divided all types into independent buckets. The current type table for the clang binary takes approx 40MB. The maximal size of a bucket containing dependent types is 1.6Mb. So, if the original type table would be divided into multiple compilation units based on types dependency then each type unit might be 1,6Mb or less.

  Dividing types into independent buckets has several advantages: since they are not dependent on each other it is possible to handle them in a parallel/independent way. Another thing is that it potentially minimizes the number of loads which should be done. It loads all dependent types at once. In case types are not divided into independent buckets we would need to load all dependencies by several loads(i.e. whenever cross-CU reference is encountered).

Assuming we decided to split the current global type table on the "decl_file" basis as you suggested, What do you think of the following questions:

1. How the following situation should be handled:

Source DWARF:

        DW_TAG_compile_unit
          DW_AT_name "cu1"

  0x100:  DW_TAG_namespace   <<<<<<<<<<<<<<<<<<<
            DW_AT_name "namespace1"

            DW_TAG_structure
              DW_AT_name "S1"
              DW_AT_decl_file "file1"
            NULL

            DW_TAG_structure
              DW_AT_name "S2"
              DW_AT_decl_file "file2"
            NULL
          NULL

          DW_TAG_import 0x100   <<<<<<<<<<<<<<<<<<<
        NULL

Result DWARF:

        DW_TAG_compile_unit
          DW_AT_name "type_table_file1"

  0x200:  DW_TAG_namespace  <<<<<<<<<<<<<<<<<<<
            DW_AT_name "namespace1"

            DW_TAG_structure
              DW_AT_name "S1"
              DW_AT_decl_file "file1"
            NULL

          NULL

        DW_TAG_compile_unit
          DW_AT_name "type_table_file2"

  0x300:  DW_TAG_namespace   <<<<<<<<<<<<<<<<<<<
            DW_AT_name "namespace1"

            DW_TAG_structure
              DW_AT_name "S2"
              DW_AT_decl_file "file2"
            NULL

          NULL

       DW_TAG_compile_unit
          DW_AT_name "cu1"

          DW_TAG_import 0x200 or 0x300 ?    <<<<<<<<<<<<<<<<<<<

Which offset corresponding to the "namespace1" should be used? Any of them?

2. Would it be OK to split DW_TAG_module ?

Source DWARF:

  DW_TAG_compile_unit
    DW_AT_name "cu1"

    DW_TAG_module             <<<<<<<<<<<<<<<<<<< 
      DW_AT_name "module1"    <<<<<<<<<<<<<<<<<<<

      DW_TAG_structure
        DW_AT_name "S1"
        DW_AT_decl_file "file1"
      NULL

      DW_TAG_structure
        DW_AT_name "S2"
        DW_AT_decl_file "file2"
      NULL

    NULL

  NULL

Result DWARF:

  DW_TAG_compile_unit
    DW_AT_name "type_table_file1"

    DW_TAG_module            <<<<<<<<<<<<<<<<<<<
      DW_AT_name "module1"   <<<<<<<<<<<<<<<<<<<

      DW_TAG_structure
        DW_AT_name "S1"
        DW_AT_decl_file "file1"
      NULL

    NULL

  NULL

  DW_TAG_compile_unit
    DW_AT_name "type_table_file2"

    DW_TAG_module            <<<<<<<<<<<<<<<<<<< 
      DW_AT_name "module1"   <<<<<<<<<<<<<<<<<<<

      DW_TAG_structure
        DW_AT_name "S2"
        DW_AT_decl_file "file2"
      NULL

    NULL

  NULL

Is it OK, that DW_TAG_module would be split?

3. Only root types should be moved into compile unit for corresponding "decl_file", right?

  DW_TAG_compile_unit
    DW_AT_name "cu1"

    DW_TAG_class_type
      DW_AT_name "class1" 
      DW_AT_decl_file "file1"

      DW_TAG_subroutine
        DW_AT_name "method1"
        DW_AT_decl_file "file1"    <<<<<<<<<<<<<<<<<<<<<
      NULL

      DW_TAG_subroutine
        DW_AT_name "method2"
        DW_AT_decl_file "file2"     <<<<<<<<<<<<<<<<<<<<<
      NULL          
    NULL
  NULL

i.e. "method1" and "method2" both should be placed into the compile unit for "file1", right?

4. what do you think: Would it be good to split current monolithic type table not on "decl_file" basis but on "buckets of dependend types" basis?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035