[PATCH] D154604: [BOLT] Calculate output values using BOLTLinker

Job Noorman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 18 08:27:04 PDT 2023


jobnoorman updated this revision to Diff 541551.
jobnoorman edited the summary of this revision.
jobnoorman added a comment.

Rebase on D155604 <https://reviews.llvm.org/D155604>.

Use map section of `<Input address, Output address>` pairs as suggested by
@maksfb. This replaces `OffsetTranslationTable` and `InputOffsetToAddressMap`
but not `LocSyms` as the latter is used to build the map section.

The only extra symbols that need to be added to the symbol table now are those
used for constant island labels. All others are kept temporary and are only used
for the relocations in the map section. Since section-relative references are
used there, these symbols do not end up in the symbol table.

Updating the line table offsets is reverted to using `MCAsmLayout` as I don't
think linker relaxation can have an influence here. I didn't find an easy way to
do this via the linker without having to insert a large amount of symbols.

Here are some performance results. I've ran this on a clang binary following the
instructions
here <https://github.com/llvm/llvm-project/blob/main/bolt/docs/OptimizingClang.md>.

tl;dr:

- Without debug info: +0%
- `--update-debug-sections`: +1%
- `--update-debug-sections --enable-bat`: +2%
- Without optimizations: +4%

Note: `ARGS=-o /tmp/clang.bolt clang-17 -b clang.yaml -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -split-all-cold -dyno-stats -icf=1 -use-gnu-stack`

No debug info:

  hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS'
  Benchmark 1: ./llvm-bolt.main $ARGS
    Time (mean ± σ):     28.853 s ±  0.098 s    [User: 63.374 s, System: 6.353 s]
    Range (min … max):   28.681 s … 29.004 s    10 runs
  
  Benchmark 2: ./llvm-bolt.iomap $ARGS
    Time (mean ± σ):     28.746 s ±  0.092 s    [User: 63.389 s, System: 6.289 s]
    Range (min … max):   28.627 s … 28.942 s    10 runs
  
  Summary
    ./llvm-bolt.iomap -o $ARGS ran
      1.00 ± 0.00 times faster than ./llvm-bolt.main $ARGS

`--update-debug-sections`:

  hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS --update-debug-sections'
  Benchmark 1: ./llvm-bolt.main $ARGS --update-debug-sections
    Time (mean ± σ):     28.945 s ±  0.065 s    [User: 63.544 s, System: 6.329 s]
    Range (min … max):   28.867 s … 29.085 s    10 runs
  
  Benchmark 2: ./llvm-bolt.iomap $ARGS --update-debug-sections
    Time (mean ± σ):     29.334 s ±  0.137 s    [User: 63.966 s, System: 6.258 s]
    Range (min … max):   29.115 s … 29.501 s    10 runs
  
  Summary
    ./llvm-bolt.main $ARGS --update-debug-sections ran
      1.01 ± 0.01 times faster than ./llvm-bolt.iomap $ARGS --update-debug-sections

`--update-debug-sections --enable-bat`:

  hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} $ARGS --update-debug-sections --enable-bat'
  Benchmark 1: ./llvm-bolt.main $ARGS --update-debug-sections --enable-bat
    Time (mean ± σ):     29.984 s ±  0.070 s    [User: 64.657 s, System: 6.280 s]
    Range (min … max):   29.885 s … 30.117 s    10 runs
  
  Benchmark 2: ./llvm-bolt.iomap $ARGS --update-debug-sections --enable-bat
    Time (mean ± σ):     30.520 s ±  0.079 s    [User: 65.149 s, System: 6.404 s]
    Range (min … max):   30.422 s … 30.675 s    10 runs
  
  Summary
    ./llvm-bolt.main $ARGS --update-debug-sections --enable-bat ran
      1.02 ± 0.00 times faster than ./llvm-bolt.iomap $ARGS --update-debug-sections --enable-bat

Without optimizations (a bit confused about why the performance here is worse for both cases):

  hyperfine --parameter-list which main,iomap --runs 10 './llvm-bolt.{which} -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat'
  Benchmark 1: ./llvm-bolt.main -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat
    Time (mean ± σ):     58.988 s ±  0.362 s    [User: 93.203 s, System: 20.905 s]
    Range (min … max):   58.657 s … 59.897 s    10 runs
  
  Benchmark 2: ./llvm-bolt.iomap -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat
    Time (mean ± σ):     61.540 s ±  0.331 s    [User: 95.545 s, System: 21.182 s]
    Range (min … max):   61.012 s … 62.160 s    10 runs
  
  Summary
    ./llvm-bolt.main -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat ran
      1.04 ± 0.01 times faster than ./llvm-bolt.iomap -o /tmp/clang.bolt clang-17 --update-debug-sections --enable-bat


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154604/new/

https://reviews.llvm.org/D154604

Files:
  bolt/include/bolt/Core/BinaryFunction.h
  bolt/include/bolt/Core/Linker.h
  bolt/include/bolt/Rewrite/JITLinkLinker.h
  bolt/include/bolt/Rewrite/RewriteInstance.h
  bolt/lib/Core/BinaryBasicBlock.cpp
  bolt/lib/Core/BinaryFunction.cpp
  bolt/lib/Rewrite/JITLinkLinker.cpp
  bolt/lib/Rewrite/MachORewriteInstance.cpp
  bolt/lib/Rewrite/RewriteInstance.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D154604.541551.patch
Type: text/x-patch
Size: 12607 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230718/0bcf19a1/attachment.bin>


More information about the llvm-commits mailing list