[lld] [lld][ELF] Add range extension thunks for x86-64 (PR #180266)

Mon Feb 9 10:46:19 PST 2026

================
@@ -1396,6 +1406,50 @@ void RetpolineZNow::writePlt(uint8_t *buf, const Symbol &sym,
   write32le(buf + 8, ctx.in.plt->getVA() - pltEntryAddr - 12);
 }
 
+// For x86-64, thunks are needed when the displacement between the branch
+// instruction and its target exceeds the 32-bit signed range (2GiB).
+// This can happen in very large binaries where .text exceeds 2GiB.
+bool X86_64::needsThunk(RelExpr expr, RelType type, const InputFile *file,
+                        uint64_t branchAddr, const Symbol &s,
+                        int64_t a) const {
+  // Only branch relocations need thunks.
+  // R_X86_64_PLT32 is used for call/jmp instructions and always needs thunks.
+  // R_X86_64_PC32 is more general and can be used for both branches and data
+  // accesses (lea, mov). We only create thunks for function symbols.
+  if (type != R_X86_64_PLT32 && (type != R_X86_64_PC32 || !s.isFunc()))
+    return false;
+
+  // If the target requires a PLT entry, check if we can reach the PLT
+  if (s.isInPlt(ctx)) {
+    uint64_t dst = s.getPltVA(ctx) + a;
+    return !inBranchRange(type, branchAddr, dst);
+  }
+
+  // For direct calls/jumps, check if we can reach the destination
+  uint64_t dst = s.getVA(ctx, a);
+  return !inBranchRange(type, branchAddr, dst);
+}
+
+// Check if a branch from src to dst is within the 32-bit signed range.
+bool X86_64::inBranchRange(RelType type, uint64_t src, uint64_t dst) const {
+  // x86-64 RIP-relative branches use a 32-bit signed displacement.
+  // The displacement is relative to the address after the instruction,
+  // which is typically 4-5 bytes after the relocation location.
+  // We use a conservative range check here.
+  int64_t offset = dst - src;
+  return llvm::isInt<32>(offset);
+}
+
+// Return the spacing for thunk sections. We want thunks to be placed
+// at intervals such that all branches can reach either the target or
+// a thunk. With a 2GiB range, we place thunks every ~1GiB to allow
+// branches to reach in either direction.
+uint32_t X86_64::getThunkSectionSpacing() const {
----------------
smithp35 wrote:

> I have some notes at https://maskray.me/blog/2026-01-25-long-branches-in-compilers-assemblers-and-linkers " thunk creation algorithm"

Thanks for writing that up.

Arm's proprietary linker started with an algorithm like lld/Macho with a contingency/slop for added thunks. It worked 99% of the time, but every year there was always an edge case from some important customer who couldn't share an example. I expect that for MachO most of the following wouldn't apply:
* Large sections (relative to the Thumb branch range of early Arm CPUS of 4 MiB containing binary blobs from .incbin)
* Complicated linker scripts with disjoint memory regions.
* Increasing the slop size didn't always work as increasing the slop meant more thunks, which required more slop etc. This is more of a risk for short branch range.

The mold approach is interesting. One potential edge case is that with disjoint memory regions adding thunks can reduce the distance between some sections, this can lead to odd situations where removing thunks can force branches out of range. Again I think this is unlikely for a SysV like program with one large .text section. It is more common with complex linker scripts.

A description of armlink's algorithm:

Step 1 

Ordering sections to minimise thunks. This is separate from thunk assignment and optional. The basic model is that programs follow a tree like structure from the entry point, but with lots of calls to utility functions like memcpy.
* Extract 10% of sections with largest number of incoming branch relocations. Move these to the centre of the address range.
* Lay out rest of sections in a breadth first order of branches from the entry point.

Step 2

* For each branch needing a thunk calculate the range of addresses a thunk can be inserted in [start, end).
* When the address range intersects a thunk can be reused, and the bounds [start, end) can be updated.
* Insert thunks in the middle of the range [start, end).

There are lots of edge cases like:
* [start, end) may only be partially covered by an output section (armlink could reuse thunks across output sections).
* [start, end) might lie inside the bounds of a large section. Thus requiring the range to be split up.


https://github.com/llvm/llvm-project/pull/180266