[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.
David Blaikie via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 10 13:00:50 PST 2022
dblaikie added a comment.
In D96035#3226125 <https://reviews.llvm.org/D96035#3226125>, @avl wrote:
>> I don't understand the second part "Types distributed between units should also be duplicated" - types can refer to types in other units (using DW_FORM_sec_offset the same as the way that types will be referenced from the non-type contexts where the type needs to be referred to (eg: a compilation unit with a subprogram that needs to refer to the return type in the global type table/type unit/whatever)), I wouldn't advocate for any duplication.
>
> The case with parametrized function member does not have a good solution if separated between compilation units.
> It either requires to duplicate part of the description(that is a solution which I was talking about trying to avoid inter-CU references),
> either introduces inter-CU reference(that is solution which you are talking about(DW_FORM_sec_offset/DW_FORM_ref_addr)).
> I think it would be good if we will use a solution that does not require duplicating data AND does not
> require using inter-CU reference(other that non-typeCU->typeCU):
>
> cat main.cpp
>
> char foo1 (void);
> long foo2 (void);
>
> int main ( void ) {
> return foo1() + foo2();
> }
>
> cat a.h
>
> struct A {
> template <class T>
> T foo () {
> return T();
> }
>
> int createi () {
> return 0;
> }
>
> float createf ( ) {
> return 0.0;
> }
> };
>
> cat foo1.cpp
>
> #include "a.h"
>
> char foo1 (void) {
> return A().foo<char>();
> }
>
> cat foo2.cpp
>
> #include "a.h"
>
> namespace name1 {
>
> struct B {
> long m1;
> };
>
> B internal (void) {
> return A().foo<B>();
> }
>
> }
>
> long foo2 (void ) {
> return name1::internal().m1;
> }
>
> int foo3 () {
>
> A var;
>
> return var.createi() + var.createf();
> }
>
> If struct "A" and struct B are placed into the different compilation units, then we have two alternatives on how to represent struct "A".
> First alternative assumes duplication of struct "A" definition while second alternative assumes inter-CU reference:
>
> 1. The first compilation unit("foo1.cpp") does not have a definition for "foo<name1::B>". The definition for "foo<name1::B>" is located into the second compilation unit("foo2.cpp"). In this case we have a duplicated definition of struct "A".
>
>
>
> DW_TAG_compile_unit
> DW_AT_name "foo1.cpp"
>
> DW_TAG_structure_type <<<<<
> DW_AT_name "A" <<<<<
>
> DW_TAG_subprogram
> DW_AT_name "foo<char>"
>
> DW_TAG_subprogram
> DW_AT_name "createi"
>
> DW_TAG_subprogram
> DW_AT_name "createf"
>
> NULL
>
>
> DW_TAG_compile_unit
> DW_AT_name "foo2.cpp"
>
> DW_TAG_structure_type <<<<<<
> DW_AT_name "A" <<<<<<
>
> DW_TAG_subprogram
> DW_AT_name "foo<name1::B>"
>
> DW_TAG_template_type_parameter
> DW_AT_type "name1::B"
>
> DW_TAG_namespace
> DW_AT_name "name1"
>
> DW_TAG_structure_type
> DW_AT_name "B"
>
> NULL
>
>
>
> 2. All definitions are inside the first unit("foo1.cpp"), but we have a cross CU reference from the first unit to the second.
>
>
>
> DW_TAG_compile_unit
> DW_AT_name "foo1.cpp"
>
> DW_TAG_structure_type
> DW_AT_name "A"
>
> DW_TAG_subprogram
> DW_AT_name "foo<char>"
>
> DW_TAG_subprogram
> DW_AT_name "createi"
>
> DW_TAG_subprogram
> DW_AT_name "createf"
>
> DW_TAG_subprogram
> DW_AT_name "foo<name1::B>"
>
> DW_TAG_template_type_parameter
> DW_AT_type "name1::B" <<<<<<
>
> NULL
>
>
> DW_TAG_compile_unit
> DW_AT_name "foo2.cpp"
>
> DW_TAG_namespace
> DW_AT_name "name1"
>
> DW_TAG_structure_type <<<<<<
> DW_AT_name "B"
>
> NULL
>
> The solution from this patch solves both problems by putting all data in a single compilation unit.
> Thus we do not need to duplicate data and create only non-typeCU->typeCU inter-CU references.
> If we want to split "type table" into a set of smaller units to minimize peak memory usage then we probably could
> group types so that all dependent types would be located in a single unit.
> That way we probably could avoid duplication and inter-CU references.
> Avoiding inter-CU references might help to create more effective implementation of an DWARF reader.
This doesn't address the case where struct `A` is a local type (in an anonymous namespace). Moving it out of the translation/compile unit it's defined it will break the representation (because the name is only valid within that compile unit - so you might end up pulling two types with the same name into the type table unit, making it hard for a DWARF consumer to do correct name lookup/know which version of the type the user is talking about in a given context).
What we could do is a third option - not duplicating the whole definition of A, but only part of the definition.
Take a look at the DWARF Clang produces for something like this, for instance:
struct t1 {
virtual void f1(); // home this type into another translation unit
template<typename T>
void f1() { }
};
namespace {
struct t2 { };
}
int main() {
t1().f1<t2>();
}
The DWARF for this will include a declaration of `t1` (so, some duplication - but not all the members, etc) with a declaration of the `f1<t2>` member in that `t1` declaration (& then an out of line definition that refers to that declaration) (in this case we could potentially also include a `DW_AT_specification` in the declaration that points to the definition in the type table, if that helps consumers significantly - so they don't have to do name lookup to figure out that they're the same thing).
Speaking of all that - how are member function definitions working in this current proposal? Do they use sec_offsets to refer to the declaration in the type table, or do they already do something like ^ with a type declaration in the same unit to house the member function declaration?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96035/new/
https://reviews.llvm.org/D96035
More information about the llvm-commits
mailing list