[PATCH] D154604: [BOLT] Calculate output values using BOLTLinker
Job Noorman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 18 08:27:04 PDT 2023
jobnoorman updated this revision to Diff 541551.
jobnoorman edited the summary of this revision.
jobnoorman added a comment.
Rebase on D155604 <https://reviews.llvm.org/D155604>.
Use map section of `<Input address, Output address>` pairs as suggested by
@maksfb. This replaces `OffsetTranslationTable` and `InputOffsetToAddressMap`
but not `LocSyms` as the latter is used to build the map section.
The only extra symbols that need to be added to the symbol table now are those
used for constant island labels. All others are kept temporary and are only used
for the relocations in the map section. Since section-relative references are
used there, these symbols do not end up in the symbol table.
Updating the line table offsets is reverted to using `MCAsmLayout` as I don't
think linker relaxation can have an influence here. I didn't find an easy way to
do this via the linker without having to insert a large amount of symbols.
Here are some performance results. I've ran this on a clang binary following the
instructions
here <https://github.com/llvm/llvm-project/blob/main/bolt/docs/OptimizingClang.md>.
tl;dr:
- Without debug info: +0%
- `--update-debug-sections`: +1%
- `--update-debug-sections --enable-bat`: +2%
- Without optimizations: +4%
Note: `ARGS=-o /tmp/clang.bolt clang-17 -b clang.yaml -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -split-all-cold -dyno-stats -icf=1 -use-gnu-stack`
No debug info:
hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS'
Benchmark 1: ./llvm-bolt.main $ARGS
Time (mean ± σ): 28.853 s ± 0.098 s [User: 63.374 s, System: 6.353 s]
Range (min … max): 28.681 s … 29.004 s 10 runs
Benchmark 2: ./llvm-bolt.iomap $ARGS
Time (mean ± σ): 28.746 s ± 0.092 s [User: 63.389 s, System: 6.289 s]
Range (min … max): 28.627 s … 28.942 s 10 runs
Summary
./llvm-bolt.iomap -o $ARGS ran
1.00 ± 0.00 times faster than ./llvm-bolt.main $ARGS
`--update-debug-sections`:
hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS --update-debug-sections'
Benchmark 1: ./llvm-bolt.main $ARGS --update-debug-sections
Time (mean ± σ): 28.945 s ± 0.065 s [User: 63.544 s, System: 6.329 s]
Range (min … max): 28.867 s … 29.085 s 10 runs
Benchmark 2: ./llvm-bolt.iomap $ARGS --update-debug-sections
Time (mean ± σ): 29.334 s ± 0.137 s [User: 63.966 s, System: 6.258 s]
Range (min … max): 29.115 s … 29.501 s 10 runs
Summary
./llvm-bolt.main $ARGS --update-debug-sections ran
1.01 ± 0.01 times faster than ./llvm-bolt.iomap $ARGS --update-debug-sections
`--update-debug-sections --enable-bat`:
hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS --update-debug-sections --enable-bat'
Benchmark 1: ./llvm-bolt.main $ARGS --update-debug-sections --enable-bat
Time (mean ± σ): 29.984 s ± 0.070 s [User: 64.657 s, System: 6.280 s]
Range (min … max): 29.885 s … 30.117 s 10 runs
Benchmark 2: ./llvm-bolt.iomap $ARGS --update-debug-sections --enable-bat
Time (mean ± σ): 30.520 s ± 0.079 s [User: 65.149 s, System: 6.404 s]
Range (min … max): 30.422 s … 30.675 s 10 runs
Summary
./llvm-bolt.main $ARGS --update-debug-sections --enable-bat ran
1.02 ± 0.00 times faster than ./llvm-bolt.iomap $ARGS --update-debug-sections --enable-bat
Without optimizations (a bit confused about why the performance here is worse for both cases):
hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat'
Benchmark 1: ./llvm-bolt.main -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat
Time (mean ± σ): 58.988 s ± 0.362 s [User: 93.203 s, System: 20.905 s]
Range (min … max): 58.657 s … 59.897 s 10 runs
Benchmark 2: ./llvm-bolt.iomap -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat
Time (mean ± σ): 61.540 s ± 0.331 s [User: 95.545 s, System: 21.182 s]
Range (min … max): 61.012 s … 62.160 s 10 runs
Summary
./llvm-bolt.main -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat ran
1.04 ± 0.01 times faster than ./llvm-bolt.iomap -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154604/new/
https://reviews.llvm.org/D154604
Files:
bolt/include/bolt/Core/BinaryFunction.h
bolt/include/bolt/Core/Linker.h
bolt/include/bolt/Rewrite/JITLinkLinker.h
bolt/include/bolt/Rewrite/RewriteInstance.h
bolt/lib/Core/BinaryBasicBlock.cpp
bolt/lib/Core/BinaryFunction.cpp
bolt/lib/Rewrite/JITLinkLinker.cpp
bolt/lib/Rewrite/MachORewriteInstance.cpp
bolt/lib/Rewrite/RewriteInstance.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D154604.541551.patch
Type: text/x-patch
Size: 12607 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230718/0bcf19a1/attachment.bin>
More information about the llvm-commits
mailing list