[PATCH] D117250: [lld-macho] Mention string literal deduplication as a difference from ld64
Fangrui Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 13 17:09:57 PST 2022
MaskRay added a comment.
Non-string constant deduplication isn't that useful.
Tail string merge has very little benefit.
On the ELF land, I have some brief notes on https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster#shf_merge-duplicate-elimination and compressed debug info.
ld.lld -O1 (default) performs SHF_MERGE|SHF_STRINGS deduplication. It has a huge impact on the size of `.debug_str`.
% ld.lld @response.txt -o clang.0 -O0
% ld.lld @response.txt -o clang.1 -O1
% stat -c %s clang.0 clang.1
2126774248
1389546048
% ~/projects/bloaty/Release/bloaty clang.0 -- clang.1
FILE SIZE VM SIZE
-------------- --------------
+286% +661Mi [ = ] 0 .debug_str
+87% +41.3Mi +87% +41.3Mi .rodata
+18e4% +266Ki [ = ] 0 .comment
+0.0% +8 [ = ] 0 .eh_frame
-0.0% -5 [ = ] 0 .debug_line
+53% +703Mi +16% +41.3Mi TOTAL
% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 /tmp/out/custom2/bin/ld.lld "{-O0,-O1}" @response.txt --threads=8 -o clang"
Benchmark 1: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang
⠧ Current estimate: 4.992 s
Time (mean ± σ): 5.006 s ± 0.032 s [User: 5.289 s, System: 3.048 s]
Range (min … max): 4.958 s … 5.079 s 10 runs
Benchmark 2: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang
Time (mean ± σ): 6.030 s ± 0.044 s [User: 11.633 s, System: 2.822 s]
Range (min … max): 5.936 s … 6.066 s 10 runs
Summary
'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang' ran
1.20 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang'
.debug_str is ~3.86x (1+286%=3.86) as large if you suppress deduplication.
There are users preferring size and users preferring speed.
If you do parallelism on string deduplication, the speed may not differ too much.
(I have tried poor man's concurrent hash map <https://gist.github.com/MaskRay/4f274c978df684c870aec0254f844487>, but don't find a noticeable improvement.)
---
Perhaps I can contribute to the parallel part of DeduplicatedCStringSection::finalizeContents? :)
If I can make my cbdr work (https://reviews.llvm.org/D114735#3236110). Currently it seems to always print the help message
% cbdr -V
cbdr 0.2.3
Tools for comparitive benchmarking
USAGE:
cbdr <SUBCOMMAND>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
analyze For each pair of benchmarks (x and y), shows, for each metric (̄x and ̄y), the CI of (̄y - ̄x) / ̄x
help Prints this message or the help of the given subcommand(s)
plot Takes CSV data on stdin and produces a vega-lite plot specification on stdout
sample Repeatedly runs benchmarks chosen at random and prints results as CSV
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D117250/new/
https://reviews.llvm.org/D117250
More information about the llvm-commits
mailing list