[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 26 12:38:34 PDT 2020


On Sun, Oct 25, 2020 at 9:31 AM Alexey Lapshin <avl.lapshin at gmail.com>
wrote:

>
> On 23.10.2020 19:43, David Blaikie wrote:
>
>
>>>
>>>
>> Ah, yeah - that seems like a missed opportunity - duplicating the whole
>> type DIE. LTO does this by making monolithic types - merging all the
>> members from different definitions of the same type into one, but that's
>> maybe too expensive for dsymutil (might still be interesting to know how
>> much more expensive, etc). But I think the other way to go would be to
>> produce a declaration of the type, with the relevant members - and let the
>> DWARF consumer identify this declaration as matching up with the earlier
>> definition. That's the sort of DWARF you get from the non-MachO default
>> -fno-standalone-debug anyway, so it's already pretty well tested/supported
>> (support in lldb's a bit younger/more work-in-progress, admittedly). I
>> wonder how much dsym size there is that could be reduced by such an
>> implementation.
>>
>> I see. Yes, that could be done and I think it would result in noticeable
>> size reduction(I do not know exact numbers at the moment).
>>
>> I work on multi-thread DWARFLinker now and it`s first version will do
>> exactly the same type processing like current dsymutil.
>>
> Yeah, best to keep the behavior the same through that
>
>> Above scheme could be implemented as a next step and it would result in
>> better size reduction(better than current state).
>>
>> But I think the better scheme could be done also and it would result in
>> even bigger size reduction and in faster execution. This scheme is
>> something similar to what you`ve described above: "LTO does - making
>> monolithic types - merging all the members from different definitions of
>> the same type into one".
>>
> I believe the reason that's probably not been done is that it can't be
> streamed - it'd lead to buffering more of the output
>
> yes. The fact that DWARF should be streamed into AsmPrinter complicates
> parallel dwarf generation. In my prototype, I generate
> several resulting files(each for one source compilation unit) and then
> sequentially glue them into the final resulting file.
>
How does that help? Do you use relocations in those intermediate object
files so the DWARF in them can refer across files?

>
> (if two of these expandable types were in one CU - the start of the second
> type couldn't be known until the end because it might keep getting pushed
> later due to expansion of the first type) and/or having to revisit all the
> type references (the offset to the second type wouldn't be known until the
> end - so writing the offsets to refer to the type would have to be deferred
> until then).
>
> That is the second problem: offsets are not known until the end of file.
> dsymutil already has that situation for inter-CU references, so it has
> extra pass to
> fixup offsets.
>
Oh, it does? I figured it was one-pass, and that it only ever refers back
to types in previous CUs? So it doesn't have to go back and do a second
pass. But I guess if sees a declaration of T1 in CU1, then later on sees a
definition of T1 in CU2, does it somehow go back to CU1 and remove the
declaration/make references refer to the definition in CU2? I figured it'd
just leave the declaration and references to it as-is, then add the
definition and use that from CU2 onwards?

> With multi-thread implementation such situation would arise more often
> for type references and so more offsets should be fixed during additional
> pass.
>
> DWARFLinker could create additional artificial compile unit and put all
>> merged types there. Later patch all type references to point into this
>> additional compilation unit.  No any bits would be duplicated in that case.
>> The performance improvement could be achieved due to less amount of the
>> copied DWARF and due to the fact that type references could be updated when
>> DWARF is cloned(no need in additional pass for that).
>>
> "later patch all type references to point into this additional compilation
> unit" - that's the additional pass that people are probably
> talking/concerned about. Rewalking all the DWARF. The current dsymutil
> approach, as far as I know, is single pass - it knows the final, absolute
> offset to the type from the moment it emits that type/needs to refer to it.
>
> Right. Current dsymutil approach is single pass. And from that point of
> view, solution
> which you`ve described(to produce a declaration of the type, with the
> relevant members)
> allows to keep that single pass implementation.
>
> But there is a restriction for current dsymutil approach: To process
> inter-CU references
> it needs to load all DWARF into the memory(While it analyzes which part of
> DWARF is live,
> it needs to have all CUs loaded into the memory).
>
All DWARF for a single file (which for dsymutil is mostly a single CU,
except with LTO I guess?), not all DWARF for all inputs in memory at once,
yeah?

> That leads to huge memory usage.
> It is less important when source is a set of object files(like in dsymutil
> case) and this
> become a real problem for llvm-dwarfutil utility when source is a single
> file(With current
> implementation it needs 30G of memory for compiling clang binary).
>
Yeah, that's where I think you'd need a fixup pass one way or another -
because cross-CU references can mean that when you figure out a new layout
for CU5 (because it has a duplicate type definition of something in CU1)
then you might have to touch CU4 that had an absolute/cross-CU forward
reference to CU5. Once you've got such a fixup pass (if dsymutil already
has one? Which, like I said, I'm confused why it would have one/that
doesn't match my very vague understanding) then I think you could make
dsymutil work on a per-CU basis streaming things out, then fixing up a few
offsets.

> Without loading all CU into the memory it would require two passes
> solution. First to analyze
> which part of DWARF relates to live code and then second pass to generate
> the result.
>
Not sure it'd require any more second pass than a "fixup" pass, which it
sounds like you're saying it already has?

> If we would have a two passes solution then we could create a compilation
> unit with all
> types at first pass and at the second pass we could generate result with
> correct offsets(no
> need to fix up them as it is currently required by dsymutil for forward
> inter-CU references).
> The open question currently: how expensive this two passes approach is.
>
> Thank you, Alexey.
>
> Anyway, that might be the next step after multi-thread DWARFLinker would
>> be ready.
>>
> Yep, be interesting to see how it all goes!
>
>>
>>
>>>
>>> Do you suggest that 0x0000011b should be transformed into something like
>>> that:
>>>
>>> 0x000000fc: DW_TAG_compile_unit
>>>               DW_AT_language    (DW_LANG_C_plus_plus)
>>>               DW_AT_name        ("templ.cpp")
>>>               DW_AT_stmt_list   (0x00000090)
>>>               DW_AT_low_pc      (0x0000000100000fa0)
>>>               DW_AT_high_pc     (0x0000000100000fab)
>>>
>>> 0x0000011b:   DW_TAG_structure_type
>>>                 DW_AT_specification (0x0000002a "x")
>>>
>>> 0x00000124:     DW_TAG_subprogram
>>>                   DW_AT_linkage_name    ("_ZN1x2f3IiEEiv")
>>>                   DW_AT_name    ("f3<int>")
>>>                   DW_AT_type    (0x000000000000005e "int")
>>>                   DW_AT_declaration     (true)
>>>                   DW_AT_external        (true)
>>>                   DW_AT_APPLE_optimized (true)
>>> 0x00000138:       NULL
>>> 0x00000139:     NULL
>>>
>>> 0x00000140:   DW_TAG_subprogram
>>>                 DW_AT_low_pc    (0x0000000100000fa0)
>>>                 DW_AT_high_pc   (0x0000000100000fab)
>>>                 DW_AT_specification     (0x0000000000000124
>>> "_ZN1x2f3IiEEiv")
>>> 0x00000155:     NULL
>>>
>>> Did I correctly get the idea?
>>>
>>
>> Yep, more or less. It'd be "safer" if 11b didn't use DW_AT_specification
>> to refer to 2a, but instead was only a completely independent declaration
>> of "x" - that path is already well supported/tested (well, it's the
>> work-in-progress stuff for lldb to support -fno-standalone-debug, but gdb's
>> been consuming DWARF like this for years, Clang and GCC both produce DWARF
>> like this (if the type is "homed" in another file, then Clang/GCC produce
>> DWARF that emits a declaration with just the members needed to define any
>> member functions defined/inlined/referenced in this CU)) for years.
>>
>> But using DW_AT_specification, or maybe some other extension attribute
>> might make the consumers task a bit easier (could do both - use an
>> extension attribute to tie them up, leave DW_AT_declaration/DW_AT_name here
>> for consumers that don't understand the extension attribute) in finding
>> that they're all the same type/pieces of teh same type.
>>
>>
>> yes. would try this solution.
>>
>> Thank you, Alexey.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201026/cd62498b/attachment.html>


More information about the llvm-dev mailing list