[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.
Alexey Lapshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 11 07:37:51 PST 2022
avl added a comment.
@clayborg I`ve done some research and have a couple of questions. Would you mind looking at them, please?
1. First thing is that I tried to model the case with separate compilation units for each declaration file and it seems that the overall impact is about 0,78% of debug_info+debug_abbrev size of the clang binary. Size of debug_info+debug_abbrev size is 158.7MB. Additional size, required for multiple compilation units, is 1.23MB(2051*(100+500)). 2051 is the number of files used in decl_file attributes, 100 bytes is the size of the compilation unit header, compilation unit die, size of namespace dies, line table header, 500 is the size of separate abbreviation table.
2. Second thing is that I divided all types into independent buckets. The current type table for the clang binary takes approx 40MB. The maximal size of a bucket containing dependent types is 1.6Mb. So, if the original type table would be divided into multiple compilation units based on types dependency then each type unit might be 1,6Mb or less.
Dividing types into independent buckets has several advantages: since they are not dependent on each other it is possible to handle them in a parallel/independent way. Another thing is that it potentially minimizes the number of loads which should be done. It loads all dependent types at once. In case types are not divided into independent buckets we would need to load all dependencies by several loads(i.e. whenever cross-CU reference is encountered).
Assuming we decided to split the current global type table on the "decl_file" basis as you suggested, What do you think of the following questions:
1. How the following situation should be handled:
Source DWARF:
DW_TAG_compile_unit
DW_AT_name "cu1"
0x100: DW_TAG_namespace <<<<<<<<<<<<<<<<<<<
DW_AT_name "namespace1"
DW_TAG_structure
DW_AT_name "S1"
DW_AT_decl_file "file1"
NULL
DW_TAG_structure
DW_AT_name "S2"
DW_AT_decl_file "file2"
NULL
NULL
DW_TAG_import 0x100 <<<<<<<<<<<<<<<<<<<
NULL
Result DWARF:
DW_TAG_compile_unit
DW_AT_name "type_table_file1"
0x200: DW_TAG_namespace <<<<<<<<<<<<<<<<<<<
DW_AT_name "namespace1"
DW_TAG_structure
DW_AT_name "S1"
DW_AT_decl_file "file1"
NULL
NULL
DW_TAG_compile_unit
DW_AT_name "type_table_file2"
0x300: DW_TAG_namespace <<<<<<<<<<<<<<<<<<<
DW_AT_name "namespace1"
DW_TAG_structure
DW_AT_name "S2"
DW_AT_decl_file "file2"
NULL
NULL
DW_TAG_compile_unit
DW_AT_name "cu1"
DW_TAG_import 0x200 or 0x300 ? <<<<<<<<<<<<<<<<<<<
Which offset corresponding to the "namespace1" should be used? Any of them?
2. Would it be OK to split DW_TAG_module ?
Source DWARF:
DW_TAG_compile_unit
DW_AT_name "cu1"
DW_TAG_module <<<<<<<<<<<<<<<<<<<
DW_AT_name "module1" <<<<<<<<<<<<<<<<<<<
DW_TAG_structure
DW_AT_name "S1"
DW_AT_decl_file "file1"
NULL
DW_TAG_structure
DW_AT_name "S2"
DW_AT_decl_file "file2"
NULL
NULL
NULL
Result DWARF:
DW_TAG_compile_unit
DW_AT_name "type_table_file1"
DW_TAG_module <<<<<<<<<<<<<<<<<<<
DW_AT_name "module1" <<<<<<<<<<<<<<<<<<<
DW_TAG_structure
DW_AT_name "S1"
DW_AT_decl_file "file1"
NULL
NULL
NULL
DW_TAG_compile_unit
DW_AT_name "type_table_file2"
DW_TAG_module <<<<<<<<<<<<<<<<<<<
DW_AT_name "module1" <<<<<<<<<<<<<<<<<<<
DW_TAG_structure
DW_AT_name "S2"
DW_AT_decl_file "file2"
NULL
NULL
NULL
Is it OK, that DW_TAG_module would be split?
3. Only root types should be moved into compile unit for corresponding "decl_file", right?
DW_TAG_compile_unit
DW_AT_name "cu1"
DW_TAG_class_type
DW_AT_name "class1"
DW_AT_decl_file "file1"
DW_TAG_subroutine
DW_AT_name "method1"
DW_AT_decl_file "file1" <<<<<<<<<<<<<<<<<<<<<
NULL
DW_TAG_subroutine
DW_AT_name "method2"
DW_AT_decl_file "file2" <<<<<<<<<<<<<<<<<<<<<
NULL
NULL
NULL
i.e. "method1" and "method2" both should be placed into the compile unit for "file1", right?
4. what do you think: Would it be good to split current monolithic type table not on "decl_file" basis but on "buckets of dependend types" basis?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96035/new/
https://reviews.llvm.org/D96035
More information about the llvm-commits
mailing list