[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.
Alexey Lapshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 22 13:28:16 PST 2021
avl added a comment.
@JDevlieghere @aprantl @dblaikie @clayborg @friss @echristo
Please consider this new version of the patch. It has several improvements and I think it is in a ready state now.
Changes from the previous version:
1. It deduplicates types defined inside subprograms. This change allows reducing the size of the resulting dwarf even more: .debug_info table is 67% less in size now.
2. Several performance optimizations were implemented. It is 2.1x faster than upstream dsymutil in multi-thread ODR mode(16 threads). It is 1.4x slower in single-thread ODR mode. For non-ODR mode, it is 4.5x faster in multi-thread mode(16 threads) and has the same performance in single-thread mode. See comparison tables 1 and 2.
3. It is compatible with "-Xclang -gsimple-template-names=simple". The previous version recognizes types by their names(utilizing the fact that names contain template parameters). This version takes into account DW_TAG_template_type_parameter and DW_TAG_template_value_parameter. It allows taking advantage created by "-Xclang -gsimple-template-names=simple": smaller size of debug_str section and performance improvement(due to smaller strings). Table 3 compares output results for debug info built with "-Xclang -debug-forward-template-params", processed by current upstream dsymutil and debug info built with "-Xclang -debug-forward-template-params -Xclang -gsimple-template-names=simple", processed by new version from this patch. The overall binary is 37% less in size.
4. Because of point 3, It is now a requirement to build debug info with "-Xclang -debug-forward-template-params". Without specifying "-Xclang -debug-forward-template-params" the output would be correct but it would be non-deterministic.
5. The debug types dump was implemented for llvm-dwarfdump(--odr-compat-dump), which allows comparing results generated by the current upstream dsymutil and the new version from this patch. Dumps, done for debug-info created by dsymutil with/without "--use-dlnext" options, might be compared and should match.
Testing:
1. It passed llvm/test/tools/dsymutil lit tests.
2. It passed check-llvm, check-clang, check-lldb.
3. It passed llvm-dwarfdump --verify for all llvm&clang binaries.
4. It passed llvm-dwarfdump --odr-compat-dump for all llvm&clang binaries:
dsymutil clang
llvm-dwarfdump --odr-compat-dump clang | md5
27e2af4662bf44adeccf158072e6ebd7
dsymutil --use-dlnext clang
llvm-dwarfdump --odr-compat-dump clang | md5
27e2af4662bf44adeccf158072e6ebd7
5. It produces deterministic output:
dsymutil --use-dlnext --num-threads 1 bugpoint
md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
a1b113c6998a6ca23084e7a9bc929184
dsymutil --use-dlnext --num-threads 2 bugpoint
md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
a1b113c6998a6ca23084e7a9bc929184
dsymutil --use-dlnext --num-threads 3 bugpoint
md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
a1b113c6998a6ca23084e7a9bc929184
dsymutil --use-dlnext --num-threads 7 bugpoint
md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
a1b113c6998a6ca23084e7a9bc929184
There still exists room for improvements. The single-thread/multi-thread performance might be improved, run-time memory requirements might be decreased. I think it would be better to do these additional improvements after main part is integrated.
Performance results for this patch for the clang binary(Darwin 24-core 64G):
Table 1. clang binary, ODR deduplication is ON:
|----------------------------------------------------------------------
| | dsymutil | dsymutil --use-dlnext |
|-------|------------------------------|------------------------------|
| |exec time| memory | DWARF(*)|exec time| memory | DWARF |
| | sec | GB | MB | sec | GB | MB |
|-------|------------------------------|------------------------------|
|threads| | | | | | |
|-------|------------------------------|------------------------------|
| 1 | 159 | 16.5 | 485 | 220 | 12.6 | 157 |
|-------|------------------------------|------------------------------|
| 2 | 99 | 17.8 | 485 | 129 | 12.6 | 157 |
|-------|------------------------------|------------------------------|
| 4 | 99 | 17.8 | 485 | 83 | 12.6 | 157 |
|-------|------------------------------|------------------------------|
| 8 | 99 | 17.8 | 485 | 57 | 12.9 | 157 |
|-------|------------------------------|------------------------------|
| 16 | 99 | 17.8 | 485 | 47 | 13.1 | 157 |
|---------------------------------------------------------------------|
Table 2. clang binary, ODR deduplication is OFF:
|----------------------------------------------------------------------
| | dsymutil | dsymutil --use-dlnext |
|-------|------------------------------|------------------------------|
| |exec time| memory | DWARF |exec time| memory | DWARF |
| | sec | GB | MB | sec | GB | MB |
|-------|------------------------------|------------------------------|
|threads| | | | | | |
|-------|------------------------------|------------------------------|
| 1 | 224 | 16.2 | 1450 | 224 | 15.0 | 1460 |
|-------|------------------------------|------------------------------|
| 2 | 218 | 18 | 1450 | 131 | 15.7 | 1460 |
|-------|------------------------------|------------------------------|
| 4 | 218 | 18 | 1450 | 83 | 15.9 | 1460 |
|-------|------------------------------|------------------------------|
| 8 | 218 | 18 | 1450 | 57 | 16.3 | 1460 |
|-------|------------------------------|------------------------------|
| 16 | 218 | 18 | 1450 | 47 | 16.6 | 1460 |
|---------------------------------------------------------------------|
Table 3. clang binary, ODR deduplication is ON
(dsymutil+"-Xclang -gsimple-template-names=simple"
ws dsymutil+"-Xclang -debug-forward-template-params -Xclang -gsimple-template-names=simple"):
|----------------------------------------------------------------------
| | dsymutil | dsymutil --use-dlnext |
|-------|------------------------------|------------------------------|
| |exec time| memory |DWARF(**)|exec time| memory | DWARF |
| | sec | GB | MB | sec | GB | MB |
|-------|------------------------------|------------------------------|
|threads| | | | | | |
|-------|------------------------------|------------------------------|
| 1 | 159 | 16.5 | 485 | 216 | 12.0 | 157 |
| | | | 258 | | | 210 |
|-------|------------------------------|------------------------------|
| 2 | 99 | 17.8 | 485 | 126 | 12.1 | 157 |
| | | | 258 | | | 210 |
|-------|------------------------------|------------------------------|
| 4 | 99 | 17.8 | 485 | 80 | 12.1 | 157 |
| | | | 258 | | | 210 |
|-------|------------------------------|------------------------------|
| 8 | 99 | 17.8 | 485 | 55 | 12.3 | 157 |
| | | | 258 | | | 210 |
|-------|------------------------------|------------------------------|
| 16 | 99 | 17.8 | 485 | 44 | 12.4 | 157 |
| | | | 258 | | | 210 |
|---------------------------------------------------------------------|
(**) DWARF is the size of .debug_info(first) and debug_str(second) section.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96035/new/
https://reviews.llvm.org/D96035
More information about the llvm-commits
mailing list