[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Mon Feb 14 08:18:00 PST 2022

avl added a comment.

> Thanks for trying this!

> Under 1% (0.78%) , I would be ok with this overhead for really clean DWARF. To clarify: was this a all types for a specific DW_AT_decl_file in it own compile unit? Or was this for all files in the same directory?

That was all types for a specific DW_AT_decl_file in it own compile unit. I did not take into account duplication of file names in the line table, but I think overall size would still be within 1%(if there would not be other duplicated things).

Things might be much worse if we would need to duplicate this kind of dies for each compilation unit :

  DW_TAG_const_type 
                  DW_AT_type (0x000000000000004a "llvm::Twine")

  DW_TAG_pointer_type
                  DW_AT_type (0x000000000003c122 "const char")

> We might have to include them both right? Do we actually see DWARF like this? But my initial guess is we would need them both if clients needed access to "S1" or "S2". Or if we can track what that DWARF accesses within the DW_TAG_import, maybe only the ones that are needed.

If we are going to include them both - then(in the real examples) it means inserting a dozens of DW_TAG_imported_module dies(for each used type).

We see DWARF like this very often:

  using namespace llvm;

  SmallSet<int, 3> Set;
  SmallVector<const int *, 8> Ptrs;

Originally it looks like this:

        DW_TAG_compile_unit
          DW_AT_name "driver_cc1.cpp"

  0x100:  DW_TAG_namespace   <<<<<<<<<<<<<<<<<<<
            DW_AT_name "llvm"

            DW_TAG_class_type
              DW_AT_name "SmallSet<int, 3>"
              DW_AT_decl_file "SmallSet.h"
            NULL

            DW_TAG_class_type
              DW_AT_name "SmallVector<const int *, 8>"
              DW_AT_decl_file "SmallVector.h"
            NULL
          NULL

          DW_TAG_imported_module
            DW_AT_import 0x100   <<<<<<<<<<<<<<<<<<<
        NULL

with new scheme it would be:

        DW_TAG_compile_unit
          DW_AT_name "type_table_SmallSet"

  0x200:  DW_TAG_namespace  <<<<<<<<<<<<<<<<<<<
            DW_AT_name "llvm"

            DW_TAG_structure
              DW_AT_name "SmallSet<int, 3>"
              DW_AT_decl_file "SmallSet.h"
            NULL

          NULL

        DW_TAG_compile_unit
          DW_AT_name "type_table_SmallVector"

  0x300:  DW_TAG_namespace   <<<<<<<<<<<<<<<<<<<
            DW_AT_name "llvm"

            DW_TAG_structure
              DW_AT_name "SmallVector<const int *, 8>"
              DW_AT_decl_file "SmallVector.h"
            NULL

          NULL

       DW_TAG_compile_unit
          DW_AT_name "driver_cc1.cpp"

          DW_TAG_imported_module   <<<<<<<<<<<<<<<<<<<
            DW_AT_import 0x200           <<<<<<<<<<<<<<<<<<<
          DW_TAG_imported_module    <<<<<<<<<<<<<<<<<<<
            DW_AT_import 0x300            <<<<<<<<<<<<<<<<<<<

It looks really overweight. These DW_TAG_imported_module dies will noticeable increase added size.

> I am very excited about this feature and I will push for adoption of this within Facebook ASAP if this is viable, faster and also much better organized!

Thanks!

Thing, which I did not understand yet - why "splitting on decl_file basis" is so good? I understand the problem with big "type_table": we do not want to parse a lot of DWARF if lazily use only part of it. But, if less fragmented solution exists, would it be good to use it instead of "splitting on decl_file basis"? 
Or "splitting on decl_file basis" is still better for some reason?

> The one thing to think about is seeing if we can make this work with "--update" option to dsymutil (which reparses the DWARF and re-emits it and the accelerator tables. This --update option could then be used to post produce debug innfo before it gets permanently archived to disk, so the smaller and better organized the better! So if we can get this working we will have a huge set of code to run this through and test it with!

yes, I think we could make it possible that "--update" (or another replacement option) would reorganize DWARF. Though it would be good if we would do at as an additional change after this patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035