[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Fri Oct 23 08:27:47 PDT 2020
On 23.10.2020 00:22, David Blaikie wrote:
>
>
> On Fri, Sep 4, 2020 at 3:42 AM Alexey <avl.lapshin at gmail.com
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
> On 03.09.2020 20:56, David Blaikie wrote:
>>
>>
>> On Thu, Sep 3, 2020 at 5:15 AM Alexey <avl.lapshin at gmail.com
>> <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>
>> On 03.09.2020 01:36, David Blaikie wrote:
>>>
>>>
>>> On Wed, Sep 2, 2020 at 3:26 PM Alexey <avl.lapshin at gmail.com
>>> <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>
>>> On 02.09.2020 21:44, David Blaikie wrote:
>>>>
>>>>
>>>> On Wed, Sep 2, 2020 at 9:56 AM Alexey
>>>> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>>
>>>> wrote:
>>>>
>>>>
>>>> On 01.09.2020 20:07, David Blaikie wrote:
>>>>> Fair enough - thanks for clarifying the
>>>>> differences! (I'd still lean a bit towards this
>>>>> being dwz-esque, as you say "an extension of
>>>>> classic dwz"
>>>> I doubt a little about "llvm-dwz" since it might
>>>> confuse people who would expect exactly the same
>>>> behavior.
>>>> But if we think of it as "an extension of classic
>>>> dwz" and the possible confusion is not a big deal then
>>>> I would be fine with "llvm-dwz".
>>>>> using a bit more domain knowledge (of terminators
>>>>> and C++ odr - though I'm not sure dsymutil does
>>>>> rely on the ODR, does it? It relies on it to know
>>>>> that two names represent the same type, I suppose,
>>>>> but doesn't assume they're already identical,
>>>>> instead it merges their members))
>>>>
>>>> if dsymutil is able to find a full definition then
>>>> it would remove all other definitions(which matched
>>>> by name) and set all references to that found
>>>> definition. If it is not able to find a full
>>>> definition then it would do nothing. i.e. if there
>>>> are two incomplete definitions(DW_AT_declaration
>>>> (true)) with the same name then they would not be
>>>> merged. That is a possible improvement - to teach
>>>> dsymutil to merge incomplete types.
>>>>
>>>> Huh, what does it do with extra member function
>>>> definitions found in later definitions? (eg: struct x {
>>>> template<typename T> void f(); }; - in one translation
>>>> unit x::f<int> is instantiated, in another x::f<float>
>>>> is instantiated - how are the two represented with
>>>> dsymutil?)
>>>
>>> They would be considered as two not matched types.
>>> dsymutil would not merge them somehow and thus would not
>>> use single type description. There would be two separate
>>> types called "x" which would have mostly matched members
>>> but differ with x::f<int> and x::f<float>. No any
>>> de-duplication in that case.
>>>
>>> Oh, that's unfortunate. It'd be nice for C++ at least, to
>>> implement a potentially faster dsymutil mode that could get
>>> this right and not have to actually check for type
>>> equivalence, instead relying on the name of the type to
>>> determine that it must be identical.
>>
>> Right. That would result in even more size reduction.
>>
>>>
>>> The first instance of the type that's encountered has its
>>> fully qualified name or mangled name recorded in a map
>>> pointing to the DIE. Any future instance gets downgraded to
>>> a declaration, and /certain/ members get dropped, but other
>>> members get stuck on the declaration (same sort of DWARF you
>>> see with "struct foo { virtual void f1(); template<typename
>>> T> void f2() { } }; void test(foo& f) { f.f2<int>(); }").
>>> Recording all the member functions of the type/static member
>>> variable types might be needed in cases where some member
>>> functions are defined in one translation unit and some
>>> defined in another - though I guess that infrastructure is
>>> already in place/that just works today.
>> My understanding, is that there is not such infrastructure
>> currently. Current infrastructure allows to reference single
>> existing type declaration(canonical) from other units. It
>> does not allow to reference different parts(in different
>> units) of incomplete type.
>>
>>
>> Huh, so what does the DWARF look like when you define one member
>> function in one file, and another member function (common with
>> inline functions) in another file?
>>
>> I think it would be necessary to change the order of how
>> compilation units are processed to implement such types merging.
>>
>>
>> Oh, I wasn't suggesting merging them - or didn't mean to suggest
>> that. I meant doing something like what we do in LLVM for type
>> homed (no-standalone) DWARF, where we attach function
>> declarations to type declarations, eg:
>>
>> struct x {
>>
>> void f1();
>>
>> void f2();
>>
>> template<typename T>
>>
>> static void f3();
>>
>> };
>>
>> #ifdef HOME
>>
>> void x::f1() {
>>
>> }
>>
>> #endif
>>
>> #ifdef AWAY
>>
>> void x::f2() {
>>
>> }
>>
>> #endif
>>
>> #ifdef TEMPL
>>
>> template<typename T>
>>
>> void x::f3() {
>>
>> }
>>
>> template void x::f3<int>();
>>
>> #endif
>>
>> Building "HOME" would show the DWARF I'd expect to see the first
>> time a type definition is encountered during dsym.
>> Building "AWAY" raises the question of - what does dsymutil do
>> with this DWARF? Does it deduplicate the type, and make the
>> definition of 'f2' point to the 'f2' declaration in the original
>> type described in the prior CU defined in "HOME"? If it doesn't
>> do that, it could/that would be good to reduce the DWARF size.
>> Building "TEMPL" would show the DWARF I'd expect to see if a
>> future use of that type definition was encountered but the
>> original/home definition had no declaration of this function: we
>> should then emit maybe an "extension" to the type (could be a
>> straight declaration, or maybe some newer/weirder hybrid that
>> points to the definition with some attribute) & then inject the
>> declaration of the template/other new member into this extension
>> definition, etc.
>>
> Please check the reduced DWARF, generated by current dsymutil for
> above example :
>
> 0x0000000b: DW_TAG_compile_unit
> DW_AT_language (DW_LANG_C_plus_plus)
> DW_AT_name ("home.cpp")
> DW_AT_stmt_list (0x00000000)
> DW_AT_low_pc (0x0000000100000f80)
> DW_AT_high_pc (0x0000000100000f8b)
>
> 0x0000002a: DW_TAG_structure_type
> DW_AT_name ("x")
> DW_AT_byte_size (0x01)
>
> 0x00000033: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f1Ev")
> DW_AT_name ("f1")
> DW_AT_type (0x000000000000005e "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
>
> 0x00000047: NULL
>
> 0x00000048: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f2Ev")
> DW_AT_name ("f2")
> DW_AT_type (0x000000000000005e "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
>
> 0x0000005c: NULL
> 0x0000005d: NULL
>
> 0x0000006a: DW_TAG_subprogram
> DW_AT_low_pc (0x0000000100000f80)
> DW_AT_high_pc (0x0000000100000f8b)
> DW_AT_specification (0x0000000000000033 "_ZN1x2f1Ev")
>
>
> 0x000000a0: DW_TAG_compile_unit
> DW_AT_language (DW_LANG_C_plus_plus)
> DW_AT_name ("away.cpp")
> DW_AT_stmt_list (0x00000048)
> DW_AT_low_pc (0x0000000100000f90)
> DW_AT_high_pc (0x0000000100000f9b)
>
> 0x000000c6: DW_TAG_subprogram
> DW_AT_low_pc (0x0000000100000f90)
> DW_AT_high_pc (0x0000000100000f9b)
> DW_AT_specification (0x0000000000000048 "_ZN1x2f2Ev")
>
> 0x000000fc: DW_TAG_compile_unit
> DW_AT_language (DW_LANG_C_plus_plus)
> DW_AT_name ("templ.cpp")
> DW_AT_stmt_list (0x00000090)
> DW_AT_low_pc (0x0000000100000fa0)
> DW_AT_high_pc (0x0000000100000fab)
>
> 0x0000011b: DW_TAG_structure_type
> DW_AT_name ("x")
> DW_AT_byte_size (0x01)
>
> 0x00000124: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f1Ev")
> DW_AT_name ("f1")
> DW_AT_type (0x0000000000000168 "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
> 0x00000138: NULL
>
> 0x00000139: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f2Ev")
> DW_AT_name ("f2")
> DW_AT_type (0x0000000000000168 "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
> 0x0000014d: NULL
>
> 0x0000014e: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f3IiEEiv")
> DW_AT_name ("f3<int>")
> DW_AT_type (0x0000000000000168 "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
> 0x00000166: NULL
> 0x00000167: NULL
>
> 0x00000174: DW_TAG_subprogram
> DW_AT_low_pc (0x0000000100000fa0)
> DW_AT_high_pc (0x0000000100000fab)
> DW_AT_specification (0x000000000000014e
> "_ZN1x2f3IiEEiv")
> 0x00000190: NULL
>
>
> >Building "HOME" would show the DWARF I'd expect to see the first
> time a type definition is encountered during dsym.
>
> compile unit "home.cpp" contains the type definition(0x0000002a)
> and reference to its member(DW_AT_specification
> (0x0000000000000033 "_ZN1x2f1Ev")).
>
> >Building "AWAY" raises the question of - what does dsymutil do
> with this DWARF? Does it deduplicate the type, and make the
> definition of 'f2' point to the 'f2' declaration in the original
> type described in the prior CU defined in "HOME"? If it doesn't do
> that, it could/that would be good to reduce the DWARF size.
>
> compile unit "away.cpp" does not contain type definition and
> contains reference to type definition from compile unit "home.cpp"
> (DW_AT_specification (0x0000000000000048 "_ZN1x2f2Ev")).
> i.e. dsymutil deduplicates the type and makes the definition of
> 'f2' point to the 'f2' declaration in the original type described
> in the prior CU "home.cpp".
>
> >Building "TEMPL" would show the DWARF I'd expect to see if a
> future use of that type definition was encountered but the
> original/home definition had no declaration of this function: we
> should then emit maybe an "extension" to the type (could be a
> straight declaration, or maybe some newer/weirder hybrid that
> points to the definition with some attribute) & then inject the
> declaration of the template/other new member into this extension
> definition, etc.
>
> compile unit "templ.cpp" contains the type definition(0x0000011b)
> which matches with (0x0000002a) plus defines the new member
> 0x0000014e.
> It also references this new member by DW_AT_specification
> (0x000000000000014e "_ZN1x2f3IiEEiv"). In this case type
> description is not de-duplicated.
>
>
> Ah, yeah - that seems like a missed opportunity - duplicating the
> whole type DIE. LTO does this by making monolithic types - merging all
> the members from different definitions of the same type into one, but
> that's maybe too expensive for dsymutil (might still be interesting to
> know how much more expensive, etc). But I think the other way to go
> would be to produce a declaration of the type, with the relevant
> members - and let the DWARF consumer identify this declaration as
> matching up with the earlier definition. That's the sort of DWARF you
> get from the non-MachO default -fno-standalone-debug anyway, so it's
> already pretty well tested/supported (support in lldb's a bit
> younger/more work-in-progress, admittedly). I wonder how much dsym
> size there is that could be reduced by such an implementation.
I see. Yes, that could be done and I think it would result in noticeable
size reduction(I do not know exact numbers at the moment).
I work on multi-thread DWARFLinker now and it`s first version will do
exactly the same type processing like current dsymutil.
Above scheme could be implemented as a next step and it would result in
better size reduction(better than current state).
But I think the better scheme could be done also and it would result in
even bigger size reduction and in faster execution. This scheme is
something similar to what you`ve described above: "LTO does - making
monolithic types - merging all the members from different definitions of
the same type into one".
DWARFLinker could create additional artificial compile unit and put all
merged types there. Later patch all type references to point into this
additional compilation unit. No any bits would be duplicated in that
case. The performance improvement could be achieved due to less amount
of the copied DWARF and due to the fact that type references could be
updated when DWARF is cloned(no need in additional pass for that).
Anyway, that might be the next step after multi-thread DWARFLinker would
be ready.
>
> Do you suggest that 0x0000011b should be transformed into
> something like that:
>
> 0x000000fc: DW_TAG_compile_unit
> DW_AT_language (DW_LANG_C_plus_plus)
> DW_AT_name ("templ.cpp")
> DW_AT_stmt_list (0x00000090)
> DW_AT_low_pc (0x0000000100000fa0)
> DW_AT_high_pc (0x0000000100000fab)
>
> 0x0000011b: DW_TAG_structure_type
> DW_AT_specification (0x0000002a "x")
>
> 0x00000124: DW_TAG_subprogram
> DW_AT_linkage_name ("_ZN1x2f3IiEEiv")
> DW_AT_name ("f3<int>")
> DW_AT_type (0x000000000000005e "int")
> DW_AT_declaration (true)
> DW_AT_external (true)
> DW_AT_APPLE_optimized (true)
> 0x00000138: NULL
> 0x00000139: NULL
>
> 0x00000140: DW_TAG_subprogram
> DW_AT_low_pc (0x0000000100000fa0)
> DW_AT_high_pc (0x0000000100000fab)
> DW_AT_specification (0x0000000000000124
> "_ZN1x2f3IiEEiv")
> 0x00000155: NULL
>
> Did I correctly get the idea?
>
>
> Yep, more or less. It'd be "safer" if 11b didn't use
> DW_AT_specification to refer to 2a, but instead was only a completely
> independent declaration of "x" - that path is already well
> supported/tested (well, it's the work-in-progress stuff for lldb to
> support -fno-standalone-debug, but gdb's been consuming DWARF like
> this for years, Clang and GCC both produce DWARF like this (if the
> type is "homed" in another file, then Clang/GCC produce DWARF that
> emits a declaration with just the members needed to define any member
> functions defined/inlined/referenced in this CU)) for years.
>
> But using DW_AT_specification, or maybe some other extension attribute
> might make the consumers task a bit easier (could do both - use an
> extension attribute to tie them up, leave DW_AT_declaration/DW_AT_name
> here for consumers that don't understand the extension attribute) in
> finding that they're all the same type/pieces of teh same type.
yes. would try this solution.
Thank you, Alexey.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201023/768d0136/attachment.html>
More information about the llvm-dev
mailing list