[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Tue Feb 1 06:58:14 PST 2022

avl added a comment.

In D96035#3285620 <https://reviews.llvm.org/D96035#3285620>, @clayborg wrote:

> I do believe that splitting types up into a compile unit that matches the DW_AT_decl_file would make this patch really hard to resist as it then makes the DWARF the best it can be. The nice thing is that if this is done it makes it very easy to tell where a type should be defined. So if the type's DW_AT_decl_file matches the current CU or if this is an anonymous namespace, then the type stays where it is. If it doesn't match, then it gets moved to a new compile unit. I don't know exactly how complex this would be, but it seems like it shouldn't be too hard. The huge type unit has the ability to greatly impact debugger performance as the code stands now because as soon as the debugger needs any type, it will have to parse all of the DIEs in the type compile unit. LLDB parses DWARF lazily and only pulls in what it needs, but with these binaries we would need to parse some 60MB of type DIEs as soon as anyone needs a type.

There are some disadvantages with creating additional compilation units for each source compile unit:

1. Fragmentation and duplication. It would be necessary to duplicate: unit header, unit die, namespace dies, base types, line table headers, line table files, abbreviation table. clang has approx 1600 compilation units. So we need to duplicate all the above information for each of them. At the end of all, we might lose some DWARF size achievements.

2. Clously coupled references. If all types would be placed in separate compilation units matched with the original unit of declaration then types would reference each other. As the result, It would be hard to process such units in a parallel manner(independently). This limits the acceleration that can be achieved by parallelization. This patch tries to avoid cross-CU references. Only one type is allowed: non-type-CU -> type-CU.

What about the following solution: Current type table unit(let`s say 60MB) would be divided into several buckets(let`s say 16) of independent types. Each bucket is placed in separate artificial compilation unit. So that there would not be references between units, there would not be a lot of duplicated information. The size of each separate type unit would be around 4MB(it would help to lldb to not parse much). Can this be a good solution? It looks like it allows to keep benefits(small final size of overall DWARF file, simple references, small size of each compile unit). It also would probably help to speed up multi-thread execution of DWARFLinker(if all type units would be generated in parallel) but I am afraid it would slow down single-thread execution.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035