[Lldb-commits] [lldb] [lldb][Mach-O] Handle shared cache binaries correctly (PR #117832)
Jason Molenda via lldb-commits
lldb-commits at lists.llvm.org
Tue Nov 26 18:06:42 PST 2024
https://github.com/jasonmolenda created https://github.com/llvm/llvm-project/pull/117832
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables.
However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables.
Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure.
rdar://140432279
>From 00a429c14d159ebc42ac7c3a7e98a91851ece236 Mon Sep 17 00:00:00 2001
From: Jason Molenda <jmolenda at apple.com>
Date: Tue, 26 Nov 2024 17:56:06 -0800
Subject: [PATCH] [lldb][Mach-O] Handle shared cache binaries correctly
The Mach-O load commands have an LC_SYMTAB / struct symtab_command
which represents the offset of the symbol table (nlist records) and
string table for this binary. In a mach-o binary on disk, these are
file offsets. If a mach-o binary is loaded in memory with all
segments consecutive, the `symoff` and `stroff` are the offsets from
the TEXT segment (aka the mach-o header) virtual address to the
virtual address of the start of these tables.
However, if a Mach-O binary is a part of the shared cache, then the
segments will be separated -- they will have different slide values.
And it is possible for the LINKEDIT segment to be greater than 4GB
away from the TEXT segment in the virtual address space, so these
32-bit offsets cannot express the offset from TEXT segment to these
tables.
Create separate uint64_t variables to track the offset to the
symbol table and string table, instead of reusing the 32-bit ones
in the symtab_command structure.
rdar://140432279
---
.../ObjectFile/Mach-O/ObjectFileMachO.cpp | 26 ++++++++++++++-----
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 079fd905037d45..5f047d84d53e73 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -2244,6 +2244,18 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
// code.
typedef AddressDataArray<lldb::addr_t, bool, 100> FunctionStarts;
+ // The virtual address offset from TEXT to the symbol/string tables
+ // in the LINKEDIT section. The LC_SYMTAB symtab_command `symoff` and
+ // `stroff` are uint32_t's that give the file offset in the binary.
+ // If the binary is laid down in memory with all segments consecutive,
+ // then these are the offsets from the mach-o header aka TEXT segment
+ // to the tables' virtual addresses.
+ // But if the binary is loaded in virtual address space with different
+ // slides for the segments (e.g. a shared cache), the LINKEDIT may be
+ // more than 4GB away from TEXT, and a 32-bit offset is not sufficient.
+ offset_t symbol_table_offset_from_TEXT = 0;
+ offset_t string_table_offset_from_TEXT = 0;
+
// Record the address of every function/data that we add to the symtab.
// We add symbols to the table in the order of most information (nlist
// records) to least (function starts), and avoid duplicating symbols
@@ -2282,6 +2294,8 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
if (m_data.GetU32(&offset, &symtab_load_command.symoff, 4) ==
nullptr) // fill in symoff, nsyms, stroff, strsize fields
return;
+ string_table_offset_from_TEXT = symtab_load_command.stroff;
+ symbol_table_offset_from_TEXT = symtab_load_command.symoff;
break;
case LC_DYLD_INFO:
@@ -2403,9 +2417,9 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
const addr_t linkedit_file_offset = linkedit_section_sp->GetFileOffset();
const addr_t symoff_addr = linkedit_load_addr +
- symtab_load_command.symoff -
+ symbol_table_offset_from_TEXT -
linkedit_file_offset;
- strtab_addr = linkedit_load_addr + symtab_load_command.stroff -
+ strtab_addr = linkedit_load_addr + string_table_offset_from_TEXT -
linkedit_file_offset;
// Always load dyld - the dynamic linker - from memory if we didn't
@@ -2473,17 +2487,17 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
lldb::addr_t linkedit_offset = linkedit_section_sp->GetFileOffset();
lldb::offset_t linkedit_slide =
linkedit_offset - m_linkedit_original_offset;
- symtab_load_command.symoff += linkedit_slide;
- symtab_load_command.stroff += linkedit_slide;
+ symbol_table_offset_from_TEXT += linkedit_slide;
+ string_table_offset_from_TEXT += linkedit_slide;
dyld_info.export_off += linkedit_slide;
dysymtab.indirectsymoff += linkedit_slide;
function_starts_load_command.dataoff += linkedit_slide;
exports_trie_load_command.dataoff += linkedit_slide;
}
- nlist_data.SetData(m_data, symtab_load_command.symoff,
+ nlist_data.SetData(m_data, symbol_table_offset_from_TEXT,
nlist_data_byte_size);
- strtab_data.SetData(m_data, symtab_load_command.stroff,
+ strtab_data.SetData(m_data, string_table_offset_from_TEXT,
strtab_data_byte_size);
// We shouldn't have exports data from both the LC_DYLD_INFO command
More information about the lldb-commits
mailing list