[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 5 19:41:00 PST 2022


dblaikie added a comment.

In D96035#3223521 <https://reviews.llvm.org/D96035#3223521>, @avl wrote:

> In D96035#3223263 <https://reviews.llvm.org/D96035#3223263>, @dblaikie wrote:
>
>> You can use cross-unit references (`DW_FORM_sec_offset`) to have one type/DIE in one unit refer to a type/DIE in another unit. So they could still be moved to/handled in a common way, but split up into multiple units to avoid pathological cases in DWARF consumers that might tend to load whole units at a time. Would that be feasible? (not sure if such consumers also then have common pathologies where they load all units that a unit references via `DW_FORM_sec_offset`... if they do, then the solution becomes more difficult and would involve either putting every type in a separate unit or finding groups of related types to put together, rather than arbitrary grouping - @clayborg do you know if lldb has that sort of behavior? (where it'd try to load all referenced units from a unit it's loading))
>
> If I understood correctly - you are talking here more about splitting the current global type table between several artificial type tables.
> Which is an additional problem, but not what I was talking about. I tried to describe situation when a class ("A") contains information for global
> namespace and for anonymous namespace:
>
>   DW_TAG_structure_type                                               | global&anonymous
>     DW_AT_calling_convention        (DW_CC_pass_by_value)             | global&anonymous
>     DW_AT_name      ("A")                                             | global&anonymous
>     DW_AT_byte_size (0x01)                                            | global&anonymous
>     DW_AT_decl_file ("/home/avl/test_templates1/./a.h")               | global&anonymous
>     DW_AT_decl_line (1)                                               | global&anonymous
>   
>     DW_TAG_subprogram                                                 | anonymous
>       DW_AT_linkage_name    ("_ZN1A3fooIN12_GLOBAL__N_11BEEET_v")     | anonymous
>       DW_AT_name    ("foo<(anonymous namespace)::B>")                 | anonymous
>       DW_AT_decl_file       ("/home/avl/test_templates1/./a.h")       | anonymous
>       DW_AT_decl_line       (3)                                       | anonymous
>       DW_AT_type    (0x0000016d "(anonymous namespace)::B")           | anonymous
>       DW_AT_declaration     (true)                                    | anonymous
>   
>       DW_TAG_template_type_parameter                                  | anonymous
>         DW_AT_type  (0x0000016d "(anonymous namespace)::B")           | anonymous
>         DW_AT_name  ("T")                                             | anonymous
>   
>       DW_TAG_formal_parameter                                         | anonymous
>         DW_AT_type  (0x000001b3 "A *")                                | anonymous
>         DW_AT_artificial    (true)                                    | anonymous
>   
>       NULL                                                            | anonymous
>       
>     DW_TAG_subprogram                                                 | global
>       DW_AT_linkage_name    ("another member of A")                   | global
>       DW_AT_name    ("another member of A")                           | global
>       
>     NULL                                                              | global&anonymous
>
>   Thus it should be split into two parts. One should go to the global type table:
>
>
>   DW_TAG_structure_type                                               
>     DW_AT_calling_convention        (DW_CC_pass_by_value)             
>     DW_AT_name      ("A")                                             
>     DW_AT_byte_size (0x01)                                           
>     DW_AT_decl_file ("/home/avl/test_templates1/./a.h")              
>     DW_AT_decl_line (1)                                              
>   
>     DW_TAG_subprogram                                                
>       DW_AT_linkage_name    ("another member of A")                  
>       DW_AT_name    ("another member of A")                          
>       
>     NULL                                                             
>
>   Another part should be left into the owning compilation unit.
>
>
>   DW_TAG_structure_type                                             
>   DW_AT_calling_convention        (DW_CC_pass_by_value)             
>   DW_AT_name      ("A")                                             
>   DW_AT_byte_size (0x01)                                            
>   DW_AT_decl_file ("/home/avl/test_templates1/./a.h")               
>   DW_AT_decl_line (1)                                               
>   
>   DW_TAG_subprogram                                                 
>     DW_AT_linkage_name    ("_ZN1A3fooIN12_GLOBAL__N_11BEEET_v")     
>     DW_AT_name    ("foo<(anonymous namespace)::B>")                 
>     DW_AT_decl_file       ("/home/avl/test_templates1/./a.h")       
>     DW_AT_decl_line       (3)                                       
>     DW_AT_type    (0x0000016d "(anonymous namespace)::B")           
>     DW_AT_declaration     (true)                                    
>   
>     DW_TAG_template_type_parameter                                  
>       DW_AT_type  (0x0000016d "(anonymous namespace)::B")           
>       DW_AT_name  ("T")                                             
>   
>     DW_TAG_formal_parameter                                         
>       DW_AT_type  (0x000001b3 "A *")                                
>       DW_AT_artificial    (true)                                    
>   
>     NULL                                                            
>     
>   NULL                                                              
>
> That kind of splitting is not implemented by current patch.
> Also that solution would require more space since it duplicates structure description.
>
> Or are you suggesting another solution for that case?

Oh, sorry, I wasn't discussing that case/didn't understand that's what was being discussed. I think for that case it's fine to move the translation-unit-local/private linkage/anonymous-ish part of the type into the global type table along with the rest of the type.

I think @clayborg's point was mostly for a purely translation-unit-local entity it could stay in the translation unit that defined it - reducing the need for cross-CU referencing to refer to the type and for work needed to move the type around, etc.

In D96035#3223734 <https://reviews.llvm.org/D96035#3223734>, @avl wrote:

> In D96035#3221098 <https://reviews.llvm.org/D96035#3221098>, @clayborg wrote:
>
>> One issue I do see with this approach as well is the first "__type_table" is about 69MB for a dSYM file for LLDB.framework in LLDB. Any types that are accessed from debug info in LLDB will end up causing this entire DW_TAG_compile_unit to be parsed which can be expensive as LLDB parses DWARF partially and only parses what it needs to when loading DWARF. Is there any way to split up the "__type_table" into multiple DW_TAG_compile_unit entries whose DW_AT_name matches the DW_AT_decl_file of the root type that is being uniqued? That would allow smaller compile units to be parsed and keep memory footprint down in debuggers.
>
> I think that it could be done in general. Though I am not sure about the concrete scheme. Following are reasons why I implemented it as a single type unit:
>
> 1. Fragmentation. clang binary has ~1600 compilation units. If every unit would have a separate type unit, then unit header, unit die, namespace dies, base types, line table headers would be duplicated. Types distributed between units should also be duplicated(if they have a template parameter from another unit, similar to the above example for an anonymous type).

Yeah, I think one type per unit would be more expensive than would be desirable - like type units which have a lot of overhead for this sort of reason.

I don't understand the second part "Types distributed between units should also be duplicated" - types can refer to types in other units (using `DW_FORM_sec_offset` the same as the way that types will be referenced from the non-type contexts where the type needs to be referred to (eg: a compilation unit with a subprogram that needs to refer to the return type in the global type table/type unit/whatever)), I wouldn't advocate for any duplication.

> 2. Cross CU references. The current type table does not have outside references. That allows processing the table by one pass. If we allow outside references then we need to accumulate such inter-connected units in memory first and then process them. I do not know how that is currently implemented in lldb - Probably there is some efficient scheme to handle inter-connected units.

If the types can all be generated in one pass today - they could be generated in one pass over multiple units as well. I don't follow why that'd require multiple passes. (laying out one unit with multiple types requires laying out all the DIEs to know the values of unit-relative offsets - putting those DIEs in several units doesn't make that situation worse in terms of the amount of data that needs to be kept around - it'd mean laying out multiple units to determine where the offset values are across those units)

> Probably, we might create some scheme with several type units. F.e. similar to what David described: "a basic heuristic would be X types per unit, but a more advanced one would depend on the size of the types (& even more advanced would group related types together to benefit from more local references - but I doubt that's worth it)."

Yep - X types would be a good first approximation. Could probably do the "X bytes or DIEs per grouping" might not be too expensive either.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035



More information about the llvm-commits mailing list