[PATCH] D117250: [lld-macho] Mention string literal deduplication as a difference from ld64

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 13 17:09:57 PST 2022


MaskRay added a comment.

Non-string constant deduplication isn't that useful.
Tail string merge has very little benefit.

On the ELF land, I have some brief notes on https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster#shf_merge-duplicate-elimination and compressed debug info.
ld.lld -O1 (default) performs SHF_MERGE|SHF_STRINGS deduplication.  It has a huge impact on the size of `.debug_str`.

  % ld.lld @response.txt -o clang.0 -O0
  % ld.lld @response.txt -o clang.1 -O1
  % stat -c %s clang.0 clang.1
  2126774248
  1389546048
  
  % ~/projects/bloaty/Release/bloaty clang.0 -- clang.1
      FILE SIZE        VM SIZE    
   --------------  -------------- 
    +286%  +661Mi  [ = ]       0    .debug_str
     +87% +41.3Mi   +87% +41.3Mi    .rodata
   +18e4%  +266Ki  [ = ]       0    .comment
    +0.0%      +8  [ = ]       0    .eh_frame
    -0.0%      -5  [ = ]       0    .debug_line
     +53%  +703Mi   +16% +41.3Mi    TOTAL
  
  % hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 /tmp/out/custom2/bin/ld.lld "{-O0,-O1}" @response.txt --threads=8 -o clang"                                                                                                                                                   
  Benchmark 1: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang
   ⠧ Current estimate: 4.992 s 
    Time (mean ± σ):      5.006 s ±  0.032 s    [User: 5.289 s, System: 3.048 s]
    Range (min … max):    4.958 s …  5.079 s    10 runs
   
  Benchmark 2: numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang
    Time (mean ± σ):      6.030 s ±  0.044 s    [User: 11.633 s, System: 2.822 s]
    Range (min … max):    5.936 s …  6.066 s    10 runs
   
  Summary
    'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O0 @response.txt --threads=8 -o clang' ran
      1.20 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom2/bin/ld.lld -O1 @response.txt --threads=8 -o clang'

.debug_str is ~3.86x (1+286%=3.86) as large if you suppress deduplication.

There are users preferring size and users preferring speed.
If you do parallelism on string deduplication, the speed may not differ too much.
(I have tried poor man's concurrent hash map <https://gist.github.com/MaskRay/4f274c978df684c870aec0254f844487>, but don't find a noticeable improvement.)

---

Perhaps I can contribute to the parallel part of DeduplicatedCStringSection::finalizeContents? :)
If I can make my cbdr work (https://reviews.llvm.org/D114735#3236110). Currently it seems to always print the help message

  % cbdr -V  
  cbdr 0.2.3
  Tools for comparitive benchmarking
  
  USAGE:
      cbdr <SUBCOMMAND>
  
  FLAGS:
      -h, --help       Prints help information
      -V, --version    Prints version information
  
  SUBCOMMANDS:
      analyze    For each pair of benchmarks (x and y), shows, for each metric (̄x and ̄y), the CI of (̄y - ̄x) / ̄x
      help       Prints this message or the help of the given subcommand(s)
      plot       Takes CSV data on stdin and produces a vega-lite plot specification on stdout
      sample     Repeatedly runs benchmarks chosen at random and prints results as CSV


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117250/new/

https://reviews.llvm.org/D117250



More information about the llvm-commits mailing list