[llvm] [BOLT] Skip _init; avoiding GOT breakage for static binaries (PR #117751)

Peter Waller via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 27 14:37:03 PST 2024


https://github.com/peterwaller-arm updated https://github.com/llvm/llvm-project/pull/117751

>From 80b41c268c5cbc3feb8b143299140b121d5bad19 Mon Sep 17 00:00:00 2001
From: Peter Waller <peter.waller at arm.com>
Date: Mon, 25 Nov 2024 13:26:19 +0000
Subject: [PATCH] [BOLT] Skip _init; avoiding GOT breakage for static binaries

_init is used during startup of binaires. Unfortunately, its address can
be shared (at least on AArch64 glibc static binaries) with a data reference
that lives in the GOT. The GOT rewriting is currently unable to
distinguish between data addresses and function addresses. This leads to
the data address being incorrectly rewritten, causing a crash on startup
of the binary:

  Unexpected reloc type in static binary.

To avoid this, don't consider _init for being moved, by skipping it.

For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.

Updates #100096.
---
 bolt/lib/Rewrite/RewriteInstance.cpp     | 21 ++++++++++++
 bolt/test/AArch64/check-init-not-moved.s | 43 ++++++++++++++++++++++++
 2 files changed, 64 insertions(+)
 create mode 100644 bolt/test/AArch64/check-init-not-moved.s

diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 7059a3dd231099..be88f11c2a4d1c 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -2927,6 +2927,23 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
     LLVM_DEBUG(dbgs() << "BOLT-DEBUG: ignoring relocation from data to data\n");
 }
 
+static BinaryFunction *getInitFunctionIfStaticBinary(BinaryContext &BC) {
+  // Workaround for https://github.com/llvm/llvm-project/issues/100096
+  // ("[BOLT] GOT array pointer incorrectly rewritten"). In aarch64
+  // static glibc binaries, the .init section's _init function pointer can
+  // alias with a data pointer for the end of an array. GOT rewriting
+  // currently can't detect this and updates the data pointer to the
+  // moved _init, causing a runtime crash. Skipping _init on the other
+  // hand should be harmless.
+  if (!BC.IsStaticExecutable)
+    return nullptr;
+  const BinaryData *BD = BC.getBinaryDataByName("_init");
+  if (!BD || BD->getSectionName() != ".init")
+    return nullptr;
+  LLVM_DEBUG(dbgs() << "BOLT-DEBUG: skip _init in for GOT workaround.\n");
+  return BC.getBinaryFunctionAtAddress(BD->getAddress());
+}
+
 void RewriteInstance::selectFunctionsToProcess() {
   // Extend the list of functions to process or skip from a file.
   auto populateFunctionNames = [](cl::opt<std::string> &FunctionNamesFile,
@@ -3047,6 +3064,10 @@ void RewriteInstance::selectFunctionsToProcess() {
     return true;
   };
 
+  if (auto *InitBD = BC->getBinaryDataByName("_init"))
+    if (BinaryFunction *Init = getInitFunctionIfStaticBinary(*BC))
+      Init->setIgnored();
+
   for (auto &BFI : BC->getBinaryFunctions()) {
     BinaryFunction &Function = BFI.second;
 
diff --git a/bolt/test/AArch64/check-init-not-moved.s b/bolt/test/AArch64/check-init-not-moved.s
new file mode 100644
index 00000000000000..ad4b80d2e60e23
--- /dev/null
+++ b/bolt/test/AArch64/check-init-not-moved.s
@@ -0,0 +1,43 @@
+# Regression test for https://github.com/llvm/llvm-project/issues/100096
+# static glibc binaries crash on startup because _init is moved and
+# shares its address with an array end pointer. The GOT rewriting can't
+# tell the two pointers apart and incorrectly updates the _array_end
+# address. Test checks that _init is not moved.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -static -Wl,--section-start=.data=0x1000 -Wl,--section-start=.init=0x1004
+# RUN: llvm-bolt %t.exe -o %t.bolt
+# RUN: llvm-nm %t.exe | FileCheck --check-prefix=CHECK-ORIGINAL %s
+# RUN: llvm-nm %t.bolt | FileCheck --check-prefix=CHECK-BOLTED %s
+
+.section .data
+.globl _array_end
+_array_start:
+    .word 0x0
+
+_array_end:
+.section .init,"ax", at progbits
+.globl _init
+
+# Check that bolt doesn't move _init.
+#
+# CHECK-ORIGINAL: 0000000000001004 T _init
+# CHECK-BOLTED:   0000000000001004 T _init
+_init:
+    ret
+
+.section .text,"ax", at progbits
+.globl _start
+
+# Check that bolt is moving some other functions.
+#
+# CHECK-ORIGINAL:   0000000000001008 T _start
+# CHECK-BOLTED-NOT: 0000000000001008 T _start
+_start:
+    bl _init
+    adrp x0, #:got:_array_end
+    ldr x0, [x0, #:gotpage_lo15:_array_end]
+    adrp x0, #:got:_init
+    ldr x0, [x0, #:gotpage_lo15:_init]
+    ret
+



More information about the llvm-commits mailing list