[PATCH] D102964: [lld-macho] Implement cstring deduplication
Nico Weber via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 3 12:00:38 PDT 2021
thakis added a comment.
A random review might not be the best place for this, but:
1. It's great we're looking at size of the output binary!
2. Maybe it makes sense to look for lower-hanging fruit before implementing somewhat expensive things?
Here's a bloaty (https://github.com/google/bloaty) diff between ld64-linked Chromium Framework (after `--`, that's where the old version goes) and lld-linked Chromium Framework:
% ~/src/bloaty/bloaty 'Chromium Framework' -- 'Chromium.app/Contents/Frameworks/Chromium Framework.framework/Versions/Current/Chromium Framework'
FILE SIZE VM SIZE
-------------- --------------
[NEW] +6.98Mi [NEW] +6.98Mi __DATA_CONST,__const
+36e2% +6.42Mi +36e2% +6.42Mi Rebase Info
+62% +3.19Mi +62% +3.19Mi __TEXT,__cstring
[NEW] +1.52Mi [NEW] +1.52Mi __TEXT,__literal16
+139% +1.11Mi +139% +1.11Mi Function Start Addresses
+1.5% +940Ki +1.5% +940Ki String Table
+38e2% +530Ki +38e2% +530Ki __TEXT,__eh_frame
+27% +124Ki +27% +124Ki __TEXT,__objc_methtype
+102% +118Ki +102% +118Ki __TEXT,__objc_methname
+1.1% +114Ki +1.1% +114Ki Symbol Table
[NEW] +88.0Ki [NEW] +88.0Ki __TEXT,__literal8
+263% +80.9Ki +263% +80.9Ki Binding Info
[NEW] +62.3Ki [NEW] +62.3Ki __TEXT,__literal4
[NEW] +32.4Ki [NEW] +32.4Ki __DATA_CONST,__cfstring
+136% +28.2Ki +136% +28.2Ki __DATA,__objc_selrefs
+0.0% +8.38Ki -0.0% -6.59Ki [31 Others]
[DEL] -26.8Ki [DEL] -26.8Ki __DATA,__cfstring
[DEL] -53.4Ki [DEL] -53.4Ki Table of Non-instructions
-30.7% -57.9Ki -30.7% -57.9Ki __DATA,__objc_const
-1.0% -83.3Ki -1.0% -83.3Ki __TEXT,__const
[DEL] -6.98Mi [DEL] -6.98Mi __DATA,__const
+6.0% +14.1Mi +5.9% +14.1Mi TOTAL
`__cstring` is indeed on the list, but there are other things before it. The _DATA_CONST looks like it's just in __DATA in ld64 and just moved around (see 2nd-to-last line), but our rebase info and LC_FUNCTION_STARTS sections are way larger and possibly easier to fix (LC_FUNCTION_STARTS is 2003880 vs 838208 -- that's 1.2 MB that are likely a cheap fix).
3% smaller is great, but 2% slower isn't exactly cheap. It's not super expensive either, but 10% here and 10% there and suddenly you take twice as long. Being much faster is one of the big selling points of lld so we should try hard not to regress on that. Several thoughts on that:
1. Maybe this should be opt in (only at -O2, and/or lto or what)? People who really want optimized binaries over link time probably do (thin) LTO.
2. If the main cost is the hash:
1. That should parallelize well
2. Is there some way we could compute the string hash at compile time and stash it somewhere? (Similar in idea to http://blog.llvm.org/2018/01/improving-link-time-on-windows-with.html)
(Some of this also applies to the ICF patch – looks like we're doing a more thorough job than ld64 with ICF but it's also more expensive.)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D102964/new/
https://reviews.llvm.org/D102964
More information about the llvm-commits
mailing list