[llvm] [BOLT] Abort on out-of-section symbols in GOT (PR #100801)

Vladislav Khmelevsky via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 26 13:42:44 PDT 2024


https://github.com/yota9 updated https://github.com/llvm/llvm-project/pull/100801

>From 07fc4f3ef7ba370f91666d2aa79a5c6d05b7b7ab Mon Sep 17 00:00:00 2001
From: Vladislav Khmelevsky <och95 at yandex.ru>
Date: Fri, 26 Jul 2024 22:15:43 +0400
Subject: [PATCH] [BOLT] Abort on out-of-section symbols in GOT

This patch aborts BOLT exeuction if it finds out-of-section (section
end) symbol in GOT table. In order to handle such situation properly in
future we would need to have arch-dependent way to analise relocations
or its sequences, e.g. for ARM it would probably be ADRP + LDR
analyzation in order to get GOT entry address. Currently it is also
challenging because GOT-related relocation symbols are replaced to
__BOLT_got_zero. Anyway it seems to be quite rare case which seems to be
only? related to static binaries. For the most part it seems that it
should be handled on linker stage, since static binary should not have
GOT table at all. LLD linker with relaxations enabled would replace
instruction addresses from GOT directly to target symbols which
elliminates the problem.

Anyway in order to achive detection of such cases this patch fixes few
things in BOLT:
1. For the end symbols we're now using section provided by ELF binary,
previously it would be tied with a wrong section found by symbol
address.
2. The end symbols would have limited registration, we would only would
add them in name->data GlobalSymbols map, since using address->data
BinaryDataMap map would likely be impossible due to address duality of
such symbols.
3. The outdated BD->getSection (currently returning refence, not
pointer) check in postProcessSymbolTable is replaced by getSize check in
order to allow zero-sized top level symbols if they are located in
zero-sized section. For the most part such things could only be found in
tests, but I don't see a reason not to handle such cases.
4. Updated section-end-sym test and removed x86_64 requirment since there
is no reason for this (tested on aarch64 linux)

The test was provided by peterwaller-arm (thank you) in #100096 and
slightly modified by me.
---
 bolt/include/bolt/Core/BinaryContext.h        |  3 +-
 bolt/lib/Core/BinaryContext.cpp               | 22 ++++++++---
 bolt/lib/Rewrite/RewriteInstance.cpp          | 37 +++++++++++++++----
 .../got_end_of_section_symbol.lld_script      |  6 +++
 bolt/test/AArch64/got_end_of_section_symbol.s | 28 ++++++++++++++
 bolt/test/X86/section-end-sym.s               |  4 +-
 6 files changed, 84 insertions(+), 16 deletions(-)
 create mode 100644 bolt/test/AArch64/Inputs/got_end_of_section_symbol.lld_script
 create mode 100644 bolt/test/AArch64/got_end_of_section_symbol.s

diff --git a/bolt/include/bolt/Core/BinaryContext.h b/bolt/include/bolt/Core/BinaryContext.h
index 5fb32a1ea6784..d2c1709b7eaff 100644
--- a/bolt/include/bolt/Core/BinaryContext.h
+++ b/bolt/include/bolt/Core/BinaryContext.h
@@ -911,7 +911,8 @@ class BinaryContext {
   /// of \p Flags.
   MCSymbol *registerNameAtAddress(StringRef Name, uint64_t Address,
                                   uint64_t Size, uint16_t Alignment,
-                                  unsigned Flags = 0);
+                                  unsigned Flags = 0,
+                                  BinarySection *Section = NULL);
 
   /// Return BinaryData registered at a given \p Address or nullptr if no
   /// global symbol was registered at the location.
diff --git a/bolt/lib/Core/BinaryContext.cpp b/bolt/lib/Core/BinaryContext.cpp
index e86075e69c05d..fbde42e822283 100644
--- a/bolt/lib/Core/BinaryContext.cpp
+++ b/bolt/lib/Core/BinaryContext.cpp
@@ -1056,18 +1056,28 @@ void BinaryContext::adjustCodePadding() {
 MCSymbol *BinaryContext::registerNameAtAddress(StringRef Name, uint64_t Address,
                                                uint64_t Size,
                                                uint16_t Alignment,
-                                               unsigned Flags) {
+                                               unsigned Flags,
+                                               BinarySection *Section) {
   // Register the name with MCContext.
   MCSymbol *Symbol = Ctx->getOrCreateSymbol(Name);
+  BinaryData *BD;
+
+  // Register out of section symbols only in GlobalSymbols map
+  if (Section && Section->getEndAddress() == Address) {
+    BD = new BinaryData(*Symbol, Address, Size, Alignment ? Alignment : 1,
+                        *Section, Flags);
+    GlobalSymbols[Name] = BD;
+    return Symbol;
+  }
 
   auto GAI = BinaryDataMap.find(Address);
-  BinaryData *BD;
   if (GAI == BinaryDataMap.end()) {
     ErrorOr<BinarySection &> SectionOrErr = getSectionForAddress(Address);
-    BinarySection &Section =
-        SectionOrErr ? SectionOrErr.get() : absoluteSection();
+    BinarySection &SectionRef = Section        ? *Section
+                                : SectionOrErr ? SectionOrErr.get()
+                                               : absoluteSection();
     BD = new BinaryData(*Symbol, Address, Size, Alignment ? Alignment : 1,
-                        Section, Flags);
+                        SectionRef, Flags);
     GAI = BinaryDataMap.emplace(Address, BD).first;
     GlobalSymbols[Name] = BD;
     updateObjectNesting(GAI);
@@ -1402,7 +1412,7 @@ void BinaryContext::postProcessSymbolTable() {
     if ((BD->getName().starts_with("SYMBOLat") ||
          BD->getName().starts_with("DATAat")) &&
         !BD->getParent() && !BD->getSize() && !BD->isAbsolute() &&
-        BD->getSection()) {
+        BD->getSection().getSize()) {
       this->errs() << "BOLT-WARNING: zero-sized top level symbol: " << *BD
                    << "\n";
       Valid = false;
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 33ebae3b6e6de..243c3f77edad1 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -955,13 +955,13 @@ void RewriteInstance::discoverFileObjects() {
     uint64_t SymbolSize = ELFSymbolRef(Symbol).getSize();
     uint64_t SymbolAlignment = Symbol.getAlignment();
 
-    auto registerName = [&](uint64_t FinalSize) {
+    auto registerName = [&](uint64_t FinalSize, BinarySection *Section = NULL) {
       // Register names even if it's not a function, e.g. for an entry point.
       BC->registerNameAtAddress(UniqueName, SymbolAddress, FinalSize,
-                                SymbolAlignment, SymbolFlags);
+                                SymbolAlignment, SymbolFlags, Section);
       if (!AlternativeName.empty())
         BC->registerNameAtAddress(AlternativeName, SymbolAddress, FinalSize,
-                                  SymbolAlignment, SymbolFlags);
+                                  SymbolAlignment, SymbolFlags, Section);
     };
 
     section_iterator Section =
@@ -986,12 +986,25 @@ void RewriteInstance::discoverFileObjects() {
                       << " for function\n");
 
     if (SymbolAddress == Section->getAddress() + Section->getSize()) {
+      ErrorOr<BinarySection &> SectionOrError =
+          BC->getSectionForAddress(Section->getAddress());
+
+      // Skip symbols from invalid sections
+      if (!SectionOrError) {
+        BC->errs() << "BOLT-WARNING: " << UniqueName << " (0x"
+                   << Twine::utohexstr(SymbolAddress)
+                   << ") does not have any section\n";
+        continue;
+      }
+
       assert(SymbolSize == 0 &&
              "unexpect non-zero sized symbol at end of section");
-      LLVM_DEBUG(
-          dbgs()
-          << "BOLT-DEBUG: rejecting as symbol points to end of its section\n");
-      registerName(SymbolSize);
+      LLVM_DEBUG({
+        dbgs() << "BOLT-DEBUG: rejecting as symbol " << UniqueName
+               << " points to end of " << SectionOrError->getName()
+               << " section\n";
+      });
+      registerName(SymbolSize, &SectionOrError.get());
       continue;
     }
 
@@ -2560,6 +2573,16 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
     return;
   }
 
+  if (Relocation::isGOT(RType) && !Relocation::isTLS(RType)) {
+    BinaryData *BD = BC->getBinaryDataByName(SymbolName);
+    if (BD && BD->getAddress() == BD->getSection().getEndAddress()) {
+      BC->errs() << "BOLT-ERROR: GOT table contains currently unsupported "
+                    "section end symbol "
+                 << BD->getName() << "\n";
+      exit(1);
+    }
+  }
+
   const uint64_t Address = SymbolAddress + Addend;
 
   LLVM_DEBUG({
diff --git a/bolt/test/AArch64/Inputs/got_end_of_section_symbol.lld_script b/bolt/test/AArch64/Inputs/got_end_of_section_symbol.lld_script
new file mode 100644
index 0000000000000..2ad4169bbcc60
--- /dev/null
+++ b/bolt/test/AArch64/Inputs/got_end_of_section_symbol.lld_script
@@ -0,0 +1,6 @@
+SECTIONS {
+  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
+  .data : { *(.data)  *(.array) }
+  .text : { *(.text) }
+  .got  : { *(.got) *(.igot) }
+}
diff --git a/bolt/test/AArch64/got_end_of_section_symbol.s b/bolt/test/AArch64/got_end_of_section_symbol.s
new file mode 100644
index 0000000000000..c203214fe3fbe
--- /dev/null
+++ b/bolt/test/AArch64/got_end_of_section_symbol.s
@@ -0,0 +1,28 @@
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown \
+# RUN:   %s -o %t.o
+# RUN: %clang %cflags  -nostartfiles -nodefaultlibs -static -Wl,--no-relax \
+# RUN:   -Wl,-q -Wl,-T %S/Inputs/got_end_of_section_symbol.lld_script  \
+# RUN:   %t.o -o %t.exe
+# RUN: not llvm-bolt %t.exe -o %t.bolt 2>&1 | FileCheck %s
+
+# CHECK: BOLT-ERROR: GOT table contains currently unsupported section end
+# CHECK-SAME: symbol array_end
+
+.section .array, "a", @progbits
+.globl array_start
+.globl array_end
+array_start:
+  .word 0
+array_end:
+
+.section .text
+.globl _start
+.type exitOk, %function
+_start:
+  adrp x1, #:got:array_start
+  ldr x1, [x1, #:got_lo12:array_start]
+  adrp x0, #:got:array_end
+  ldr x0, [x0, #:got_lo12:array_end]
+  adrp x2, #:got:_start
+  ldr x2, [x2, #:got_lo12:_start]
+  ret
diff --git a/bolt/test/X86/section-end-sym.s b/bolt/test/X86/section-end-sym.s
index 545cf37263da5..29ff6e05118aa 100644
--- a/bolt/test/X86/section-end-sym.s
+++ b/bolt/test/X86/section-end-sym.s
@@ -1,7 +1,7 @@
 ## Check that BOLT doesn't consider end-of-section symbols (e.g., _etext) as
 ## functions.
 
-# REQUIRES: x86_64-linux, asserts
+# REQUIRES: system-linux, asserts
 
 # RUN: llvm-mc -filetype=obj -triple x86_64-unknown-linux %s -o %t.o
 # RUN: ld.lld %t.o -o %t.exe -q
@@ -9,7 +9,7 @@
 # RUN:   | FileCheck %s
 
 # CHECK: considering symbol etext for function
-# CHECK-NEXT: rejecting as symbol points to end of its section
+# CHECK-NEXT: rejecting as symbol etext points to end of .text section
 # CHECK-NOT: Binary Function "etext{{.*}}" after building cfg
 
 



More information about the llvm-commits mailing list