[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Wed May 13 12:36:54 PDT 2020


Hi David, Excuse me for delayed answer. It took some time to prepare. Please, find the answers bellow...


>Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

> - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

There are two general requirements:
 1) Remove (or clean) invalid debug info.
 2) Optimize the DWARF size.

The specifics which our users have:
 - embedded platform which uses 0 as start of .text section.
 - custom toolset which does not support all features yet(f.e. split dwarf).
 - tolerant of the link-time increase.
 - need a useful way to share debug builds.

For the first point: we have a problem "Overlapping address ranges starting from 0"(D59553).
We use custom solution, but the general solution like D74169 would be better here.

For the second point: split dwarf could be a good alternative to have debug info with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the "Overlapping address ranges"
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves "Overlapping address ranges"
problem, binary with minimal size, supported by current tools, easy to share debug build(single binary with
minimal size).

> In general, in the current state, I don't have strong feelings either way about this going in as-is with the intent to >improve it to make it more viable - or some of that work being done out-of-tree until it's a more viable >performance tradeoff. Mostly happy to leave that up to folks more involved with lld.
>
>A couple of minor points...

>> C: --function-sections --gc-sections --fdebug-types-section
> ^ not sure of the point of testing/showing comparisons with a situation that's currently unsupported


that situation is currently supported(--gc-debuginfo is not used in this measurement).
"--fdebug-types-section" is supported functionality.
The purpose of these data is to compare results for "--fdebug-types-section" and "--gc-debuginfo".



>>2. Support of type units.

>>  That could be implemented further.

>Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
>linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to
>get the linking improvements.

But, DWARFLinker should adequately work with type units since they are already implemented.
If someone uses --fdebug-types-section, then it should adequately work when used together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same or not.
But the context is known when types are generated. So, no need to spent the time analyzing it.
If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
That is just an idea(not for immediate implementation): If types would be stored in some "type table"
(instead of COMDAT section group) and could be accessed through hash-id(like type units)
- then it would be the solution requiring fewer bits to store but allowing to compare types
by hash-id(not analysing context).
In this case, size increasing would be small. And processing time could be done faster.

this is just an idea and could be discussed separately from the problem of integrating of D74169.



>>4. split DWARF support.

>>   This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:
>>   a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage

>>   collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping

>> address ranges(D59553).


>This wouldn't/couldn't completely address the issue - because some address ranges would be in the .dwo files >the linker can't see - and they'd still end up with the interesting address ranges.

I see, Thank you. Thus it would not be a complete solution.



>> 6. -flto=thin

>>    That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

>> current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

>> probably be fixed by avoiding generation of such incomplete declaration during thinlto,

>> That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>> more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
>> definition is known to exist in another Module, etc)

>I don't know if it's a problem since that patch was reverted.

Yes. That patch was reverted, but this patch(D74169) has the same problem.
if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - "removing definitions from some llvm Modules if the type
definition is known to exist in another Module".
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is generated(probably without
significant increase of debug info size):

Let`s check the example:

0x0000000b: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000201700)
              DW_AT_high_pc     (0x0000000000201719)

0x0000002a:   DW_TAG_subprogram
0x00000043:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000000000000086 "_Z1fv")
                  DW_AT_low_pc  (0x0000000000201700)
                  DW_AT_high_pc (0x0000000000201718)

0x00000057:       DW_TAG_variable
                    DW_AT_abstract_origin       (0x0000000000000096 "var")
0x00000065:       NULL

0x00000073: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000080)

0x00000086:   DW_TAG_subprogram
                DW_AT_name      ("f")
                DW_AT_inline    (DW_INL_inlined)

0x00000096:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x000000a9 "volatile Foo")
0x000000a1:     NULL

0x000000a9:   DW_TAG_volatile_type
                DW_AT_type      (0x000000ae "Foo")

0x000000ae:   DW_TAG_structure_type
                DW_AT_name      ("Foo")
                DW_AT_declaration       (true)

0x000000c1: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x0000000000000019)

0x000000e0:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000019)
                DW_AT_name      ("f")

0x000000fd:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x00000119 "volatile Foo")

0x00000119:   DW_TAG_volatile_type
                DW_AT_type      (0x0000011e "Foo")

0x0000011e:   DW_TAG_structure_type
                DW_AT_name      ("Foo")
                DW_AT_decl_line (1)

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function "f".
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function "f".
DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f" definition.

Code for function "f" was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
containing "f" definition (since there is no corresponding code). But it has structure "Foo" definition
DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration for "abstract instance root"?
The number of concrete instances is bigger than number of abstract instance roots.
Probably, it would not be too costly to leave definition in abstract instance root?

Alternatively, Would it cost too much if type definition would not be replaced with declaration when declaration references type from not used function? (lto could understand that concrete function is not used).


Thank you, Alexey.





_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200513/b0ccaf1c/attachment.html>


More information about the llvm-dev mailing list