[all-commits] [llvm/llvm-project] 494a74: Reapply "ELF: Add branch-to-branch optimization."

Tue Jun 24 22:16:39 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 494a74882b2664c99dda3c4c456a33ab2cc4c376
      https://github.com/llvm/llvm-project/commit/494a74882b2664c99dda3c4c456a33ab2cc4c376
  Author: Peter Collingbourne <peter at pcc.me.uk>
  Date:   2025-06-24 (Tue, 24 Jun 2025)

  Changed paths:
    M lld/ELF/Arch/AArch64.cpp
    A lld/ELF/Arch/TargetImpl.h
    M lld/ELF/Arch/X86_64.cpp
    M lld/ELF/Config.h
    M lld/ELF/Driver.cpp
    M lld/ELF/InputSection.cpp
    M lld/ELF/Options.td
    M lld/ELF/Relocations.cpp
    M lld/ELF/Target.h
    M lld/docs/ReleaseNotes.rst
    M lld/docs/ld.lld.1
    A lld/test/ELF/aarch64-branch-to-branch.s
    A lld/test/ELF/x86-64-branch-to-branch.s

  Log Message:
  -----------
  Reapply "ELF: Add branch-to-branch optimization."

Fixed assertion failure when reading .eh_frame sections, and added
.eh_frame sections to tests.

This reverts commit 1e95349dbe329938d2962a78baa0ec421e9cd7d1.

Original commit message follows:

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
        0.0243538 +/- 0.00233202
        1.87831% +/- 0.179859%
        (Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Reviewers: zmodem, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/145579

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications