[PATCH] D96035: [dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Alexey Lapshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 22 13:28:16 PST 2021


avl added a comment.

@JDevlieghere @aprantl @dblaikie @clayborg @friss @echristo

Please consider this new version of the patch. It has several improvements and I think it is in a ready state now.

Changes from the previous version:

1. It deduplicates types defined inside subprograms. This change allows reducing the size of the resulting dwarf even more: .debug_info table is 67% less in size now.

2. Several performance optimizations were implemented. It is 2.1x faster than upstream dsymutil in multi-thread ODR mode(16 threads). It is 1.4x slower in single-thread ODR mode. For non-ODR mode, it is 4.5x faster in multi-thread mode(16 threads) and has the same performance in single-thread mode. See comparison tables 1 and 2.

3. It is compatible with "-Xclang -gsimple-template-names=simple". The previous version recognizes types by their names(utilizing the fact that names contain template parameters). This version takes into account DW_TAG_template_type_parameter and DW_TAG_template_value_parameter. It allows taking advantage created by "-Xclang -gsimple-template-names=simple": smaller size of debug_str section and performance improvement(due to smaller strings). Table 3 compares output results for debug info built with "-Xclang -debug-forward-template-params", processed by current upstream dsymutil and debug info built with "-Xclang -debug-forward-template-params -Xclang -gsimple-template-names=simple", processed by new version from this patch. The overall binary is 37% less in size.

4. Because of point 3, It is now a requirement to build debug info with "-Xclang -debug-forward-template-params". Without specifying "-Xclang -debug-forward-template-params" the output would be correct but it would be non-deterministic.

5. The debug types dump was implemented for llvm-dwarfdump(--odr-compat-dump), which allows comparing results generated by the current upstream dsymutil and the new version from this patch. Dumps, done for debug-info created by dsymutil with/without "--use-dlnext" options, might be compared and should match.

Testing:

1. It passed llvm/test/tools/dsymutil lit tests.
2. It passed check-llvm, check-clang, check-lldb.
3. It passed llvm-dwarfdump --verify for all llvm&clang binaries.
4. It passed llvm-dwarfdump --odr-compat-dump for all llvm&clang binaries:

  dsymutil clang 
  llvm-dwarfdump --odr-compat-dump clang | md5
  27e2af4662bf44adeccf158072e6ebd7
  dsymutil --use-dlnext clang
  llvm-dwarfdump --odr-compat-dump clang | md5
  27e2af4662bf44adeccf158072e6ebd7

      

5. It produces deterministic output:

  dsymutil --use-dlnext --num-threads 1 bugpoint
  md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
  a1b113c6998a6ca23084e7a9bc929184
  dsymutil --use-dlnext --num-threads 2 bugpoint
  md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
  a1b113c6998a6ca23084e7a9bc929184
  dsymutil --use-dlnext --num-threads 3 bugpoint
  md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
  a1b113c6998a6ca23084e7a9bc929184
  dsymutil --use-dlnext --num-threads 7 bugpoint
  md5 bugpoint.dSYM/Contents/Resources/DWARF/bugpoint
  a1b113c6998a6ca23084e7a9bc929184

There still exists room for improvements. The single-thread/multi-thread performance might be improved, run-time memory requirements might be decreased. I think it would be  better to do these additional improvements after main part is integrated.

Performance results for this patch for the clang binary(Darwin 24-core 64G):

Table 1. clang binary, ODR deduplication is ON:

  |----------------------------------------------------------------------
  |       |           dsymutil           |     dsymutil --use-dlnext    |
  |-------|------------------------------|------------------------------|
  |       |exec time|  memory  | DWARF(*)|exec time|  memory  |  DWARF  |
  |       |   sec   |    GB    |    MB   |   sec   |    GB    |   MB    |
  |-------|------------------------------|------------------------------|
  |threads|         |          |         |         |          |         |
  |-------|------------------------------|------------------------------|
  |   1   |   159   |   16.5   |   485   |   220   |   12.6   |   157   |
  |-------|------------------------------|------------------------------|
  |   2   |    99   |   17.8   |   485   |   129   |   12.6   |   157   |
  |-------|------------------------------|------------------------------|
  |   4   |    99   |   17.8   |   485   |    83   |   12.6   |   157   |
  |-------|------------------------------|------------------------------|
  |   8   |    99   |   17.8   |   485   |    57   |   12.9   |   157   |
  |-------|------------------------------|------------------------------|
  |  16   |    99   |   17.8   |   485   |    47   |   13.1   |   157   |
  |---------------------------------------------------------------------|

Table 2. clang binary, ODR deduplication is OFF:

  |----------------------------------------------------------------------
  |       |           dsymutil           |     dsymutil --use-dlnext    |
  |-------|------------------------------|------------------------------|
  |       |exec time|  memory  |  DWARF  |exec time|  memory  |  DWARF  |
  |       |   sec   |    GB    |    MB   |   sec   |    GB    |   MB    |
  |-------|------------------------------|------------------------------|
  |threads|         |          |         |         |          |         |
  |-------|------------------------------|------------------------------|
  |   1   |   224   |   16.2   |   1450  |   224   |   15.0   |   1460  |
  |-------|------------------------------|------------------------------|
  |   2   |   218   |    18    |   1450  |   131   |   15.7   |   1460  |
  |-------|------------------------------|------------------------------|
  |   4   |   218   |    18    |   1450  |    83   |   15.9   |   1460  |
  |-------|------------------------------|------------------------------|
  |   8   |   218   |    18    |   1450  |    57   |   16.3   |   1460  |
  |-------|------------------------------|------------------------------|
  |  16   |   218   |    18    |   1450  |    47   |   16.6   |   1460  |
  |---------------------------------------------------------------------|

Table 3. clang binary, ODR deduplication is ON
(dsymutil+"-Xclang -gsimple-template-names=simple" 
 ws dsymutil+"-Xclang -debug-forward-template-params -Xclang -gsimple-template-names=simple"):

  |----------------------------------------------------------------------
  |       |           dsymutil           |     dsymutil --use-dlnext    |
  |-------|------------------------------|------------------------------|
  |       |exec time|  memory  |DWARF(**)|exec time|  memory  |  DWARF  |
  |       |   sec   |    GB    |    MB   |   sec   |    GB    |   MB    |
  |-------|------------------------------|------------------------------|
  |threads|         |          |         |         |          |         |
  |-------|------------------------------|------------------------------|
  |   1   |   159   |   16.5   |   485   |   216   |   12.0   |   157   |
  |       |         |          |   258   |         |          |   210   |
  |-------|------------------------------|------------------------------|
  |   2   |    99   |   17.8   |   485   |   126   |   12.1   |   157   |
  |       |         |          |   258   |         |          |   210   |
  |-------|------------------------------|------------------------------|
  |   4   |    99   |   17.8   |   485   |    80   |   12.1   |   157   |
  |       |         |          |   258   |         |          |   210   |
  |-------|------------------------------|------------------------------|
  |   8   |    99   |   17.8   |   485   |    55   |   12.3   |   157   |
  |       |         |          |   258   |         |          |   210   |
  |-------|------------------------------|------------------------------|
  |  16   |    99   |   17.8   |   485   |    44   |   12.4   |   157   |
  |       |         |          |   258   |         |          |   210   |
  |---------------------------------------------------------------------|

(**) DWARF is the size of .debug_info(first) and debug_str(second) section.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035



More information about the llvm-commits mailing list