[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Mon May 11 14:06:45 PDT 2020
>Hi Alexey,
Hi James, Thank you for your comments. Please, find my answers below:
>Regarding the link performance timings, have you tried profiling to see if there are any obvious performance >improvements that could be made? A slow down of 7x seems like an awfully large amount given what this >should be doing after all.
I do not see "easy to fix" alternatives. But there are some posibilities to improve performance:
1. ~10% improvement could probably be achieved by optimizing string pools
(NonRelocatableStringpool/DwarfStringPool).
Measurements show that it is spent ~10 sec in llvm::StringMapImpl::LookupBucketFor(). The problem
is that the same strings, again and again, are added to the string pool. Two attributes
having the same string value would be analyzed (hash calculated) and searched inside
the string pool. Even if these strings are already in string table(DW_FORM_strp, DW_FORM_strx).
The process could be optimized for string tables. So that if some string from the string table were
accessed previously then, it would keep a reference into the string pool. This would eliminate
a lot of string pool searches.
2. ~20-30% improvement by processing each object file in parallel.
Currently, all object files are analyzed sequentially and cloned sequentially.
Cloning is started in parallel with analyzing. That scheme could be changed:
analyzing and cloning could be done in parallel for each object file.
That requires refactoring of DWARFLinker and making string pools and DeclContextTree
thread-safe.
3. ~10-20% improvement by support type units.
Currently, dsymutil/DWARFLinker does not support type units. If type units would be supported, then the "analyzing" step could be skipped for significant part of debug info data. This would save time.
4. ~2-3% improvement could probably be achieved by optimizing DWARF parser classes.
Following is a list of ideas:
https://reviews.llvm.org/D78672#inline-720056
https://reviews.llvm.org/D78672#2000012
https://reviews.llvm.org/D78672#2000363.
>Also, do you have an idea whether the slow down is exponential for the size/linear etc?
It is linear. Following is the data for different runs(Output size is the size of overall binary) :
---------------------------------------
| linking time, sec | Output size, MB |
---------------------------------------
| 4 | 64 |
| 5 | 79 |
| 18 | 211 |
| 25 | 308 |
| 29 | 356 |
| 51 | 526 |
| 72 | 788 |
---------------------------------------
>The problem is that if it is opt-in, but the link time cost is so high, it may put people off ever enabling it, which >would be a shame, as the debugger load time improvements seem worthwhile having.
>From the other side - integrating of D74169 allows to make things iteratively. Doing above performance optimizations would require significant time. Implementing support of DWARF5 would probably require significant time. It would be much longer to implement whole thing at a time. Also, if D74169 would be integrated then additional people could probably join that work. I think LLVM developer policy encourages splitting some work on smaller pieces and iteratively integrate them.
Thank you, Alexey.
>James
On Fri, 8 May 2020 at 14:18, Alexey Lapshin via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Folks, we work on optimization of binary size and improvement of debug info quality.
To reduce the size of the binary we use -ffunction-sections so that unused code would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left behind.
Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete debug info.
Currently, it has the following limitations: does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections, split DWARF, thin lto.
Following are size/performance results for the D74169:
A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib
LLVM code base:
--------------------------------------------------------------
| Options | build time | bin size | lib size |
--------------------------------------------------------------
| A | 54min(100%) | 19.0G(100%) | 15.0G(100.0%) |
--------------------------------------------------------------
| B | 65min(120%) | 9.7G( 51%) | 12.0G( 80.0%) |
--------------------------------------------------------------
| C | 53min( 98%) | 12.0G( 63%) | 15.0G(100.0%) |
--------------------------------------------------------------
| D | 52min( 96%) | 12.0G( 63%) | 8.2G( 55.0%) |
--------------------------------------------------------------
| E | 64min(118%) | 5.3G( 28%) | 12.0G( 80.0%) |
--------------------------------------------------------------
Clang binary:
-------------------------------------------------------------
| Options | size | link time | used memory |
-------------------------------------------------------------
| A | 1.50G(100%) | 9sec(100%) | 9307MB(100%) |
-------------------------------------------------------------
| B | 0.76G( 50%) | 68sec(755%) | 15055MB(161%) |
-------------------------------------------------------------
| C | 0.82G( 54%) | 8sec( 89%) | 8402MB( 90%) |
-------------------------------------------------------------
| D | 0.96G( 64%) | 6sec( 67%) | 4273MB( 46%) |
-------------------------------------------------------------
| E | 0.43G( 29%) | 77sec(855%) | 15000MB(161%) |
-------------------------------------------------------------
lldb loading time:
--------------------------------------------
| Options | time | used memory |
--------------------------------------------
| A | 6.4sec(100%) | 1495MB(100%) |
--------------------------------------------
| B | 4.0sec( 63%) | 826MB( 55%) |
--------------------------------------------
| C | 3.7sec( 58%) | 877MB( 59%) |
--------------------------------------------
| D | 4.3sec( 67%) | 1023MB( 69%) |
--------------------------------------------
| E | 2.1sec( 33%) | 478MB( 32%) |
--------------------------------------------
I want to discuss the results and to decide whether it is worth to integrate of D74169:
improvements:
1. Reduces the size of debug info(50%).
2. Resolves overlapping of address ranges(D59553).
3. Reduced size of debug info allows tools to work faster and to require less memory.
drawbacks and not implemented features:
1. linking time is increased(755%).
The --gc-debuginfo option is off by default. So it would affect only those who need it and explicitly specified it.
I think the current DWARFLinker code could be optimized more to improve performance results.
2. Support of type units.
That could be implemented further.
3. DWARF5.
Current DWARFEmitter/DWARFStreamer has an implementation for DWARF generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of DwarfDebug/DwarfFile.
4. split DWARF support.
This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:
a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping address
ranges(D59553).
b) The approach similar to dsymutil implementation could be used to generate monolithic debuginfo created
from .dwo files. That suggestion is from - https://reviews.llvm.org/D74169#1888386.
i.e., DWARFLinker could be taught to generate the same output as D74169 but for split DWARF as the source.
5. -fmodules-debuginfo
That problem was described in this review - https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has the same problem. It could be solved using the fact that DWARFLinker analyzes debuginfo. It could recognize debug info generated for the module and keep it(compile units containing debug info for modules do not have low_pc, high_pc).
6. -flto=thin
That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could probably be fixed by avoiding generation of such incomplete declaration during thinlto, or, alternatively, DWARFLinker could recognize such situation and copy missed type declaration.
=======================================================================================
Debuginfo, Linker folks, What do you think about current results and future directions?
It introduces quite a significant linking time increase(6x-8x). But it would affect only those who use that feature.
Thus the users will be able to decide whether that linking time increase is acceptable or not.
Resolving all 1-6 points is quite a significant work. But, in the result, debug info is more correct and compact.
Do you think that it would be good to integrate it and to start to work on improving?
Thank you, Alexey.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200511/0eee9acc/attachment.html>
More information about the llvm-dev
mailing list