[Lldb-commits] [PATCH] Strip ELF symbol versions from symbol names
jingham at apple.com
jingham at apple.com
Thu Feb 26 10:36:53 PST 2015
Note, MacOS X used to play this sort of trick, where it would have a set of "equivalent" symbols and it would pick which one would get used based on some mysterious knowledge of its own. We dealt with this - at least in the breakpoints - by implementing the DynamicLoader::FindEquivalentSymbols method. It looks like that no longer gets used because Mac OS X went to indirect & resolver symbols, which allowed the debugger to directly figure out what the actual called symbol was going to be. But you might be able to revive this and call it in the break point machinery to make sure that you don't miss any calls to this family of functions. If we're not going to use it for this we should probably get rid of it...
Jim
> On Feb 26, 2015, at 10:18 AM, Greg Clayton <clayborg at gmail.com> wrote:
>
>
>> On Feb 26, 2015, at 9:02 AM, Pavel Labath <labath at google.com> wrote:
>>
>> So here is me thinking out loud about this issue...
>>
>> What are the current use cases for the Symbols and SymTabs in lldb?
>>
>> - symbolification (aka looking up a symbol by address): In this case we would probably want to output "memcpy@@GLIBC_2.14" because that _is_ the name of the symbol in the object file and it also provides the most information.
>
> Agreed. I don't like losing information like the real symbol name.
>
>> - symbol resolution (aka looking up a symbol by name): in which situations do we need to do this? Currently, I am aware of only one: user provided expressions in the "expr" command. Are there any other use cases?
>
> Mostly the expression parser. Users might also try to disassemble a function by name and provide "memcpy@@GLIBC_2.14" and it would be nice if this worked (which wouldn't happen if we stripped this down to just "memcpy".
>
>> - the ELF versioning spec says that when we do not have any additional information, we should pick the default (latest) version. This is the one with @@ in it's name. When user types "expr memcpy(a,b,c)", we do not have any information, so the string "memcpy" should resolve to the same address as "memcpy@@GLIBC_2.14". We could try to be clever and figure out what version is used in the rest of the code, but that may prove to be quite difficult. Furthermore, we almost definitely want the expression `char foo[]="bar"; do_something_with(foo)` (which compiles to something involving memcpy), to use the default symbol version, since the user is probably not even aware that there is a call to memcpy involved (I certainly wasn't).
>
> Agreed.
>
>> - we would like to keep the non-default symbol versions (e.g. "memcpy at GLIBC_2.2.5"), so that we can do symbolification, but we don't want "memcpy" to resolve to these symbols unless the user explicitly specifies "memcpy at GLIBC_2.2.5" (which right now he can't as the expr command will bark out a syntax error. It might be possible to call the function by embedding the right asm commands in the expr expression, but I do not care about this right now.
>>
>> So how do we achieve this? For C symbols we can store the full symbol name in the mangled field and the bare name in the demangled one. However, this does not work for C++ symbols, as they already use both fields.
>
> Actually demangling will fail when there is "@@" in the name, so we will currently end up with a mangled name of "_ZSt10adopt_lock@@GLIBCXX_3.4.11" and nothing in the demangled name.
>
>> Furthermore, currently the demangling of versioned c++ symbols fails completely as the demangler does not understand the version specifications.
>
> For good reason, this isn't part of the mangled name specification, it is just junk added by the linker to make things happen. Our linker does similar stuff that always confuses us.
>
>> For the "symbolification" use case it would be best to have "_ZSt10adopt_lock@@GLIBCXX_3.4.11" as the mangled name and "std::adopt_lock@@GLIBCXX_3.4.11" as the demangled. However, for symbol resolution, we want both "_ZSt10adopt_lock" and "_ZSt10adopt_lock@@GLIBCXX_3.4.11" to resolve correctly. I can think of three ways to achieve this:
>>
>> - teach Symbol class to do intelligent string matching, so that it can resolve both versioned and unversioned names. Not optimal since it would complicate the general Symbol class due to a ELF peculiarity.
>
> Not necessarily, see solution below.
>> - insert two Symbol instances into the Symtab. Symbol resolution would be easy, but if we want to guarantee that we always return the versioned symbol during symbolification, we would need to do something clever there, which is again not nice.
>
> No we don't need to do this.
>
>> - allow symbols to have multiple names - again not optimal since it complicates the Symbol class, but at least the version handling could be contained in the ELF specific code - the Symbol wouldn't know about the versions, it would only know it has these 2 (or whatever) names.
>
> We don't need o do this.
>
>> As you can see, I am not exactly thrilled by any of these options. What do you think about it?
>
> We just need to teach Symtab::InitNameIndexes() to makes it name lookup takes correctly. So we can take "memcpy@@GLIBC_2.14" for symbol whose index is 123 and make it add entries for name lookups correctly:
>
> memcpy@@GLIBC_2.14 -> 123
> memcpy -> 123
>
> The symbol table also has a pointer to the object file, so we might be able to ask the object file (ObjectFileELF) if it wants to chop up the name for lookup:
>
> So we have:
>
> ConstString symbol_name = symbol->GetName(); // symbol_name == "memcpy@@GLIBC_2.14"
> ConstString lookup_name;
> if (m_objfile->RemoveLinkerSuffix (symbol_name, lookup_name))
> {
> // lookup_name == "memcpy"
> ...
> }
>
> This allows object file specific name extraction (ELF would look for "@@", and mach-o might look for '$' in the middle of names for names like "_listen$UNIX2003").
>
> So the key is:
>
> 1 - symbols will continue to contain one set of names (mangled/demangled)
> 2 - we only change the name lookup tables for symbols so if we lookup "memcpy" we might actually get two matches:
> memcpy@@GLIBC_2.14
> memcpy@@GLIBC_3.56
> The expression parser will probably need to be modified to deal with getting multiple matches and resolve the correct match through the lldb_private::DynamicLinker plug-in.
>
> We might also want to mark a symbol as containing a version suffix with a single bit or bool that doesn't increase the storage for a lldb_private::Symbol since we might want "_ZSt10adopt_lock@@GLIBCXX_3.4.11" to have a mangled name of "_ZSt10adopt_lock@@GLIBCXX_3.4.11" and the demangled name of "std::adopt_lock@@GLIBCXX_3.4.11", so we might already want to scan for "@@" when parsing the ELF file. This makes our extracting lookup symbol name a bit easier:
>
> if (symbol->HasLinkerSuffix())
> {
> ConstString symbol_name = symbol->GetName(); // symbol_name == "memcpy@@GLIBC_2.14"
> ConstString lookup_name;
> if (m_objfile->RemoveLinkerSuffix (symbol_name, lookup_name))
> {
> // lookup_name == "memcpy"
> ...
> }
> }
>
> Let me know what you think.
>
> Greg
> _______________________________________________
> lldb-commits mailing list
> lldb-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
More information about the lldb-commits
mailing list