[Lldb-commits] [lldb] r132582 - /lldb/trunk/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Greg Clayton gclayton at apple.com
Mon Jun 6 13:42:27 PDT 2011


On Jun 5, 2011, at 9:35 AM, Peter Collingbourne wrote:

> On Fri, Jun 03, 2011 at 02:08:58PM -0700, Greg Clayton wrote:
>> Another clarification I meant to initially convey, so one more time:
>> 
>> Will the content of SHT_SYMTAB always contain the all the same symbols found in SHT_DYNSYM and more symbol (symbols that aren't required by the dynamic loader)? Or do the symbols for the dynamic loader only get put into the SHT_DYNSYM sections, and any other symbols get put into the SHT_SYMTAB sections? 
>> 
>> Shouldn't we just parse the SHT_SYMTAB sections and fall back to the SHT_DYNSYM section(s) if there are no SHT_SYMTAB sections?
> 
> Hi Greg,
> 
> Generally .dynsym is a subset of .symtab (if present), and this does
> seem to be an implicit rule in the ELF specification [1].  However,
> in practice this rule is not always adhered to (I have at least one
> (shared) object file on my system in which both .dynsym and .symtab
> are present but .dynsym is not a subset of .symtab).  So I think
> the safest thing to do is to parse both symbol tables, under the
> "be lenient in what you accept" principle.
> 
> Is it a problem to have duplicate symbols in Symtab?  If we do this,
> should we be filtering out duplicates somehow?


In general, yes we should be filtering duplicates, just to keep the object file information as concise as possible.

Some examples in the mach-o object file plug-in include compressing a linker symbol for "main" along with the STAB entries for it into a single symbol. The STAB entry for a function actually is 2 symbols since the mach-o symbol table entry doesn't have a length field where the first has the address value for "main" and the second has the function size. The STABs also include 2 symbols that delineate the start and end of a function, so each STAB entry actually has 4 symbol table entries. 

Dumping the raw mach-o nlist entries for a simple a.out program:


----------------------------------------------------------------------
Symbol table for: 'a.out' (i386)
----------------------------------------------------------------------
Index    n_strx   n_type             n_sect n_desc n_value
======== -------- ------------------ ------ ------ ----------------
[     0] 00000002 64 (N_SO         ) 00     0000   0000000000000000 '/Volumes/work/gclayton/Documents/src/args/'
[     1] 0000002d 64 (N_SO         ) 00     0000   0000000000000000 'main.c'
[     2] 00000034 66 (N_OSO        ) 03     0001   000000004dcb3e87 '/Volumes/work/gclayton/Documents/src/args/main.o'
[     3] 00000001 2e (N_BNSYM      ) 01     0000   0000000000001bb0
[     4] 00000065 24 (N_FUN        ) 01     0000   0000000000001bb0 '_main'
[     5] 00000001 24 (N_FUN        ) 00     0000   00000000000002c6
[     6] 00000001 4e (N_ENSYM      ) 01     0000   00000000000002c6
[     7] 00000001 64 (N_SO         ) 01     0000   0000000000000000
[     8] 0000006b 0e (     SECT    ) 06     0000   0000000000002000 '_pvars'
[     9] 00000072 0f (     SECT EXT) 09     0000   0000000000002038 '_NXArgc'
[    10] 0000007a 0f (     SECT EXT) 09     0000   000000000000203c '_NXArgv'
[    11] 00000082 0f (     SECT EXT) 09     0000   0000000000002044 '___progname'
[    12] 0000008e 03 (     ABS  EXT) 01     0010   0000000000001000 '__mh_execute_header'
[    13] 000000a2 0f (     SECT EXT) 09     0000   0000000000002040 '_environ'
[    14] 000000ab 0f (     SECT EXT) 01     0000   0000000000001bb0 '_main'
[    15] 000000b1 0f (     SECT EXT) 01     0000   0000000000001b70 'start'
[    16] 000000b7 01 (     UNDF EXT) 00     0100   0000000000000000 '___stack_chk_fail'
[    17] 000000c9 01 (     UNDF EXT) 00     0100   0000000000000000 '___stack_chk_guard'
[    18] 000000dc 01 (     UNDF EXT) 00     0100   0000000000000000 '_exit'
[    19] 000000e2 01 (     UNDF EXT) 00     0100   0000000000000000 '_getcwd'
[    20] 000000ea 01 (     UNDF EXT) 00     0100   0000000000000000 '_perror'
[    21] 000000f2 01 (     UNDF EXT) 00     0100   0000000000000000 '_printf'
[    22] 000000fa 01 (     UNDF EXT) 00     0100   0000000000000000 '_puts'
[    23] 00000100 01 (     UNDF EXT) 00     0100   0000000000000000 'dyld_stub_binder'

Loading this into lldb we see:


% lldb a.out 
Current executable set to 'a.out' (i386).
(lldb) target modules dump symtab a.out 
Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out, num_symbols = 18:
               Debug symbol
               |Synthetic symbol
               ||Externally Visible
               |||
Index   UserID DSX Type         File Address/Value Size               Flags      Name
------- ------ --- ------------ ------------------ ------------------ ---------- ----------------------------------
[    0]      0 D   SourceFile   0x0000000000000000 Sibling -> [    3] 0x00640000 /Volumes/work/gclayton/Documents/src/args/main.c
[    1]      2 D   ObjectFile   0x000000004dcb3e87 0x0000000000000000 0x00660001 /Volumes/work/gclayton/Documents/src/args/main.o
[    2]      4 D   Code         0x0000000000001bb0 0x00000000000002c6 0x000f0000 main
[    3]      8     Data         0x0000000000002000 0x0000000000000000 0x000e0000 pvars
[    4]      9   X Data         0x0000000000002038 0x0000000000000000 0x000f0000 NXArgc
[    5]     10   X Data         0x000000000000203c 0x0000000000000000 0x000f0000 NXArgv
[    6]     11   X Data         0x0000000000002044 0x0000000000000000 0x000f0000 __progname
[    7]     12   X Absolute     0x0000000000001000 0x0000000000000000 0x00030010 _mh_execute_header
[    8]     13   X Data         0x0000000000002040 0x0000000000000000 0x000f0000 environ
[    9]     15   X Code         0x0000000000001b70 0x0000000000000000 0x000f0000 start
[   10]     16     Trampoline   0x0000000000001e76 0x0000000000000006 0x00010100 __stack_chk_fail
[   11]     17   X Extern       0x0000000000000000 0x0000000000000000 0x00010100 __stack_chk_guard
[   12]     18     Trampoline   0x0000000000001e7c 0x0000000000000006 0x00010100 exit
[   13]     19     Trampoline   0x0000000000001e82 0x0000000000000006 0x00010100 getcwd
[   14]     20     Trampoline   0x0000000000001e88 0x0000000000000006 0x00010100 perror
[   15]     21     Trampoline   0x0000000000001e8e 0x0000000000000006 0x00010100 printf
[   16]     22     Trampoline   0x0000000000001e94 0x0000000000000006 0x00010100 puts
[   17]     23   X Extern       0x0000000000000000 0x0000000000000000 0x00010100 dyld_stub_binder


Note that the "UserID" of a symbol maintains the original symbol table index in case any data in the object file refers to symbols by the original index. For linux, you would want to somehow encode the "UserID" to be the symbol table index + the section header index if any data in ELF would later want to dig up some info on one of the ELF symbols. Or you can just make the UserID a monotonically increasing index where you might need to remember the number of symbols in each section header that was a symbol table and map it back accordingly.

So the main point is that is would be great if we can keep the LLDB symbol tables as simple as they need to be by removing duplicates and merging symols when possible. The mach-o example above merged the "_main" symbols #3-#6, and #14 in the mach-o into a single LLDB symbol table entry #2. It also repurposed the undefined external symbols (#16, #18-#23) into valid trampoline symbols -- they have been modified to point to the mach-o PLT trampoline code. So this takes symbols that are in the symbol table that are actually for the dynamic linker, and makes them into actual symbols that cover all of the code. Some debuggers throw away the undefined symbols and then make their own new trampoline entries for the PLT code, but in LLDB we reuse them since the data in mach-o actually refers to these undefined symbols via the original symbol table index ("UserID" in LLDB, or "Index" in the mach-o dump).

So think about which ELF symbols are needed by the ELF object file in order to do name and address lookups, and feel free to make up new symbols for things like the PLT entries. Anytime you end up with an address query that comes from an object file that falls within the virtual file address of an object file, and that address doesn't return a valid symbol, we need to look at why and make sure there isn't a symbol we should be producing that explains some details about that address.








More information about the lldb-commits mailing list