[all-commits] [llvm/llvm-project] d67e15: ELF: Add branch-to-branch optimization.

Fri May 23 16:58:05 PDT 2025

  Branch: refs/heads/users/pcc/spr/elf-add-branch-to-branch-optimization
  Home:   https://github.com/llvm/llvm-project
  Commit: d67e152baaf8487e5cb049166ce61e905011171e
      https://github.com/llvm/llvm-project/commit/d67e152baaf8487e5cb049166ce61e905011171e
  Author: Peter Collingbourne <peter at pcc.me.uk>
  Date:   2025-05-23 (Fri, 23 May 2025)

  Changed paths:
    M lld/ELF/Arch/AArch64.cpp
    A lld/ELF/Arch/TargetImpl.h
    M lld/ELF/Arch/X86_64.cpp
    M lld/ELF/Config.h
    M lld/ELF/Driver.cpp
    M lld/ELF/Options.td
    M lld/ELF/Relocations.cpp
    M lld/ELF/Target.h
    M lld/docs/ld.lld.1
    A lld/test/ELF/aarch64-branch-to-branch.s
    A lld/test/ELF/x86-64-branch-to-branch.s

  Log Message:
  -----------
  ELF: Add branch-to-branch optimization.

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications