[all-commits] [llvm/llvm-project] e0581c: ELF: Add branch-to-branch optimization.

Thu May 22 21:36:42 PDT 2025

  Branch: refs/heads/users/pcc/spr/elf-add-branch-to-branch-optimization
  Home:   https://github.com/llvm/llvm-project
  Commit: e0581c892d07d8bb5518fa412b75b8830f5fb14a
      https://github.com/llvm/llvm-project/commit/e0581c892d07d8bb5518fa412b75b8830f5fb14a
  Author: Peter Collingbourne <peter at pcc.me.uk>
  Date:   2025-05-22 (Thu, 22 May 2025)

  Changed paths:
    M lld/ELF/Arch/AArch64.cpp
    A lld/ELF/Arch/TargetImpl.h
    M lld/ELF/Arch/X86_64.cpp
    M lld/ELF/Config.h
    M lld/ELF/Driver.cpp
    M lld/ELF/Options.td
    M lld/ELF/Relocations.cpp
    M lld/ELF/Target.h
    M lld/docs/ld.lld.1
    A lld/test/ELF/aarch64-branch-to-branch.s
    A lld/test/ELF/x86-64-branch-to-branch.s

  Log Message:
  -----------
  ELF: Add branch-to-branch optimization.

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications