[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Tue Feb 1 12:32:13 PST 2022

> On Feb 1, 2022, at 9:26 AM, David Blaikie via Phabricator <reviews at reviews.llvm.org> wrote:
> 
> dblaikie added a comment.
> 
> In D96035#3287393 <https://reviews.llvm.org/D96035#3287393>, @avl wrote:
> 
>> In D96035#3285620 <https://reviews.llvm.org/D96035#3285620>, @clayborg wrote:
>> 
>>> I do believe that splitting types up into a compile unit that matches the DW_AT_decl_file would make this patch really hard to resist as it then makes the DWARF the best it can be. The nice thing is that if this is done it makes it very easy to tell where a type should be defined. So if the type's DW_AT_decl_file matches the current CU or if this is an anonymous namespace, then the type stays where it is. If it doesn't match, then it gets moved to a new compile unit. I don't know exactly how complex this would be, but it seems like it shouldn't be too hard. The huge type unit has the ability to greatly impact debugger performance as the code stands now because as soon as the debugger needs any type, it will have to parse all of the DIEs in the type compile unit. LLDB parses DWARF lazily and only pulls in what it needs, but with these binaries we would need to parse some 60MB of type DIEs as soon as anyone needs a type.
>> 
>> There are some disadvantages with creating additional compilation units for each source compile unit:
>> 
>> 1. Fragmentation and duplication. It would be necessary to duplicate: unit header, unit die, namespace dies, base types, line table headers, line table files, abbreviation table. clang has approx 1600 compilation units. So we need to duplicate all the above information for each of them. At the end of all, we might lose some DWARF size achievements.

base types wouldn't need to be duplicated right? We can put any types with no decl file into the current type unit you created and have everyone use absolute links to them? I was not a fan of the abbreviation tables being split up, I believe the llvm dsymutil has done this for a while now, as it leads to duplication as you already mention. I would prefer on big .debug_abbrev section if possible so all compile units can re-use, but I understand that this would slow things down.
> 
> Oh, yeah, too many units sounds unfortunate for sure. Currently we'd have one CU per .cpp file, and with "putting types in a synthetic unit that matches their `decl_file`" we'd have an additional one CU per .h file - which, admittedly, is probably a bit more than doubling the number of units. (template instantiations, for instance, might all be grouped together in the same unit, since they're defined in the same header)

So 11 bytes for a compile unit plus a few for the main DIE doesn't sound too bad to me when everything would be perfectly organized in the DWARF. Since there would all types would be have matching DW_AT_decl_file attributes, the line table should be very small, with hopefully just a few files. I would love to see a file done like this just so we can compare the sizes of an actual file to see if we are using up that much more space.

> I don't know that this makes the DWARF 'the best it can be' - it's not clear to me what expressive power is provided by the types being in units that match source file names (indeed the information is provided in the DWARF either way - a consumer can treat types defined in a certain file in a certain way even if they're all in one big CU).

DWARF has always been a "shove a bunch of stuff into one big blob per compile unit" with no regard to how it will be used in the linking phase since DWARF was created. Linkers are expected to concatenate and relocate and call it a day. This leaves a ton of debug info in the linked executable that isn't useful, like functions that should have been dead stripped, but weren't because of the limitations of the DWARF format and what can be done quickly by linkers that don't understand the format. It is not easy for do anything with DWARF and it isn't a great format for just accessing a thing or two here and there since it is a serialized stream of conciousness of what is in compile unit with no real organization other than "I just put it all in one bucket".

I love the idea of having the DWARF being really well organized from a consumer stand point. Not many producers are thinking about how this information is consumed, just how quick and fast we can make it or link it. There is definitely value to doing things quickly, don't get me wrong, so for the compile/edit/debug cycle we need to keep things fast. But for when you want to create a build and archive it, I would rather spend a few more cycles getting the DWARF in the best shape it can be in so that it can be consumed efficiently by symbolication servers that want to use it, or debuggers to debug a released build than saving a few seconds or bytes in the file.

> 
>> 2. Clously coupled references. If all types would be placed in separate compilation units matched with the original unit of declaration then types would reference each other. As the result, It would be hard to process such units in a parallel manner(independently). This limits the acceleration that can be achieved by parallelization. This patch tries to avoid cross-CU references. Only one type is allowed: non-type-CU -> type-CU.

Currently you have one compile unit that all threads must access and register their types with right? I don't see how splitting this up would cause more delays than all threads vying for access to the one and only type unit? Seems like it would actually reduce lock contention, but I haven't fully looked at your current solution, so forgive my saying so if this is way off.

Why are we trying to avoid cross CU references with DW_FORM_ref_addr? This is placing the emphasis on linking speed once again over how this information will be accessed by consumers. And one might say that bad DWARF organization can cause more delays when these DWARF files are used for debugging or symolication by having to parse all type DIEs for all types just because it was faster to create the DWARF file in the first place.

> 
> Yeah.
> 
>> What about the following solution: Current type table unit(let`s say 60MB) would be divided into several buckets(let`s say 16) of independent types. Each bucket is placed in separate artificial compilation unit. So that there would not be references between units, there would not be a lot of duplicated information. The size of each separate type unit would be around 4MB(it would help to lldb to not parse much). Can this be a good solution? It looks like it allows to keep benefits(small final size of overall DWARF file, simple references, small size of each compile unit). It also would probably help to speed up multi-thread execution of DWARFLinker(if all type units would be generated in parallel) but I am afraid it would slow down single-thread execution.

Does anyone cares about single thread execution time? I personally don't.
> 
> Finding independent buckets of types sounds difficult/algorithmically complicated. But maybe that's feasible? I'm not sure. I was thinking more "emit all the types the same way you do currently, except into multiple unit "Chunks"" - ie: the code already handles type-to-type references within the single type-CU, so I don't understand (maybe I'm missing something) why it would be difficult to treat that "type-CU" as actually being "multiple type CUs" with arbitrary cross-referencing within that collection of type CUs. Then the chunks/buckets are chosen arbitrarily - admittedly that means longer encoding (sec_offset references are longer than the unit-relative references) than if you can group the types together into isolated/only-self-referencing groups - so maybe the extra space savings is worth the work to create those isolated groups? Naively, I would not have expected it to be worth that much.

DW_FORM_ref_addr is 32 bits for DWARF32, and not many use DWARF64 as it has other size increases, so using these to do cross CU references is just as cheap as a CU relative reference. And I haven't seen any compilers take advantage of using DW_FORM_ref1 or DW_FORM_ref2, they just use DW_FORM_ref4 by default which is the same size as DW_FORM_ref_addr.

I would love to see the "each type gets put into their own compile unit based on the DW_AT_decl_file of each type", just to see what this does to the size of the DWARF compared to the currently solution. If the size increase is too much, I would drop my suggestion. Is trying this out hard to do and not worrying at all about performance?

One other solution would be to group types into common subdirectories based off of the DW_AT_decl_file. The idea being and all STL types could be in one common compile unit for anything in with a Dw_AT_decl_file that starts with a common directory prefix? This would allow "vector", "list" and "map" types for the STL to all be in a a specific type compile unit? So the idea would be to strip the file base name from the DW_AT_decl_file and group all types from all header files in that directory into the same type unit?

> 
> 
> Repository:
>  rG LLVM Github Monorepo
> 
> CHANGES SINCE LAST ACTION
>  https://reviews.llvm.org/D96035/new/
> 
> https://reviews.llvm.org/D96035
>