[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 11 10:03:16 PST 2022


dblaikie added a comment.

In D96035#3234040 <https://reviews.llvm.org/D96035#3234040>, @avl wrote:

>> ! In D96035#3232592 <https://reviews.llvm.org/D96035#3232592>, @dblaikie wrote:
>>
>> This doesn't address the case where struct `A` is a local type (in an anonymous namespace). Moving it out of the translation/compile unit it's defined it will break the representation (because the name is only valid within that compile unit - so you might end up pulling two types with the same name into the type table unit, making it hard for a DWARF consumer to do correct name lookup/know which version of the type the user is talking about in a given context).
>
> This patch put such types into the type table separately. It creates two different namespaces with different DW_AT_decl_file attribute:
>
>   DW_TAG_compile_unit
>     DW_AT_name "__type_table"
>     
>     DW_TAG_namespace
>       DW_AT_decl_file "compile_unit1.cpp"
>       
>       DW_TAG_class_type
>         DW_AT_name "B"
>       NULL
>     NULL
>   
>     DW_TAG_namespace
>       DW_AT_decl_file "compile_unit2.cpp"
>       
>       DW_TAG_class_type
>         DW_AT_name "B"
>       NULL
>     NULL
>   
>   NULL

Yeah, I don't think that's enough to address the scoping issue - which file they're written in doesn't define their scope of name lookup, unfortunately. (eg: you could #include a file into another file - the file the code is written in doesn't tell you whether another entity defined in a different file can see that name (because the files could be included into the same translation unit)) - local types, I think, have to stay in the compile unit that defined them - I don't think there's any other solution to the language scoping.

>> What we could do is a third option - not duplicating the whole definition of A, but only part of the definition.
>>
>> Take a look at the DWARF Clang produces for something like this, for instance:
>>
>>   struct t1 {
>>     virtual void f1(); // home this type into another translation unit
>>     template<typename T>
>>     void f1() { }
>>   };
>>   namespace {
>>   struct t2 { };
>>   }
>>   int main() {
>>     t1().f1<t2>();
>>   }
>>
>> The DWARF for this will include a declaration of `t1` (so, some duplication - but not all the members, etc) with a declaration of the `f1<t2>` member in that `t1` declaration (& then an out of line definition that refers to that declaration) (in this case we could potentially also include a `DW_AT_specification` in the declaration that points to the definition in the type table, if that helps consumers significantly - so they don't have to do name lookup to figure out that they're the same thing).
>
> yes. I assumed the similar approach here - https://reviews.llvm.org/D96035#3222090 (part about types splitting).
>
> If type splitting were supported then this approach allows leaving anonymous types inside the compilation unit at the cost of little duplication and without unnecessary inter-CU references. It also allows decreasing the overall size of "type table". The current patch does not support type splitting. I suggest implementing it as an additional improvement.
>
> Types splitting is necessary because originally generated "declaration of t1 (so, some duplication - but not all the members, etc) with a declaration of the f1<t2> member in that t1 declaration" might contain other members. So we need to move these other members into the type table and leave "declaration of the f1<t2> member" in place. Otherwise, (because other members will stay not deduplicated) the size of resulting DWARF would be bigger.

Yep, that sounds right to me. To the best of my knowledge, the sort of things that may need to get "left behind" in the compilation unit/not moved to the type table, would be member templates like the one we've been discussing, and nested types (eg: you could have a pimpl idiom - `struct t1 { struct t2; }; ... namespace { struct t3 { }; } struct t1::t2 { t3 m; };` so the whole definition of `t2` would have to stay in the compilation unit with `t3`, I think - because it needs to reference `t3`. The type table could include a declaration of `t2`, since `t1` might include pointers to `t2` or other references that only require a declaration)

The nice thing about this particular way of splitting - is that there's loads of existing precedent for it. This sort of DWARF (declarations with members) is generated by clang all the time for homed debug info. (though, that said, lldb isn't /super/ fond of this sort of debug info - so it wouldn't surprise me if it'd need some fixes to handle this sort of thing in a dsym, but we might be lucky there & it already works because at least all the types are still defined within the single dsym)

>> Speaking of all that - how are member function definitions working in this current proposal? Do they use sec_offsets to refer to the declaration in the type table, or do they already do something like ^ with a type declaration in the same unit to house the member function declaration?
>
> All member function definitions are merged into the type which is in type table. Later they are referenced(using DW_FORM_ref_addr) from compile units:
>
>   DW_TAG_compile_unit
>     DW_AT_name "__type_table"
>   
>     DW_TAG_structure_type
>       DW_AT_name "t1"
>         
>       DW_TAG_subprogram
>         DW_AT_name "t1"
>       NULL
>   
>       DW_TAG_subprogram
>         DW_AT_name "f1"
>         
>         DW_TAG_template_type_parameter
>           DW_AT_type "__type_table::main.cpp::t2"  // I used names instead offsets, in fact this is DW_FORM_ref_addr referencing into "__type_table"
>       NULL
>     NULL
>     
>     DW_TAG_namespace
>       DW_AT_decl_file "main.cpp"
>       
>       DW_TAG_struct_type
>         DW_AT_name "t2"
>       NULL
>     NULL
>   
>   NULL
>   
>   DW_TAG_compile_unit
>     DW_AT_name "main.cpp"
>     
>     DW_TAG_subprogram
>       DW_AT_name "main"
>     NULL
>   
>     DW_TAG_subprogram
>       DW_AT_name "t1"
>       DW_AT_specification "__type_table::t1::t1" // I used names instead offsets, in fact this is DW_FORM_ref_addr referencing into "__type_table"
>     NULL
>   
>     DW_TAG_subprogram
>       DW_AT_name "f1"
>       DW_AT_specification "__type_table::t1::f1" // I used names instead offsets, in fact this is DW_FORM_ref_addr referencing into "__type_table"
>     NULL
>   
>   NULL

Ah, OK - thanks for the details!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035



More information about the llvm-commits mailing list