[Lldb-commits] [PATCH] Strip ELF symbol versions from symbol names

Thu Feb 26 10:03:10 PST 2015

> On Feb 26, 2015, at 9:02 AM, Pavel Labath <labath at google.com> wrote:
> 
> So here is me thinking out loud about this issue...
> 
> What are the current use cases for the Symbols and SymTabs in lldb?
> 
> - symbolification (aka looking up a symbol by address): In this case we would probably want to output "memcpy@@GLIBC_2.14" because that _is_ the name of the symbol in the object file and it also provides the most information.
> 
> - symbol resolution (aka looking up a symbol by name): in which situations do we need to do this? Currently, I am aware of only one: user provided expressions in the "expr" command. Are there any other use cases?

breakpoint set -n

I don't know how these symbol variants work in ELF, but if there's a chance that code could call more than one of these variants, depending on how the library was built or whatever, then "break set -n" had better resolve to all the relevant symbols.  If only one will ever get called by some magic, then it's fine to just pick that one.

disassemble -n

>  - the ELF versioning spec says that when we do not have any additional information, we should pick the default (latest) version. This is the one with @@ in it's name. When user types "expr memcpy(a,b,c)", we do not have any information, so the string "memcpy" should resolve to the same address as "memcpy@@GLIBC_2.14". We could try to be clever and figure out what version is used in the rest of the code, but that may prove to be quite difficult. Furthermore, we almost definitely want the expression `char foo[]="bar"; do_something_with(foo)` (which compiles to something involving memcpy), to use the default symbol version, since the user is probably not even aware that there is a call to memcpy involved (I certainly wasn't).

The latter will fall out from whatever lookup mechanism you come up with, since internally lldb's expression parser tells the JIT what symbol to use.

>  - we would like to keep the non-default symbol versions (e.g. "memcpy at GLIBC_2.2.5"), so that we can do symbolification, but we don't want "memcpy" to resolve to these symbols unless the user explicitly specifies "memcpy at GLIBC_2.2.5" (which right now he can't as the expr command will bark out a syntax error. It might be possible to call the function by embedding the right asm commands in the expr expression, but I do not care about this right now.
> 
> So how do we achieve this? For C symbols we can store the full symbol name in the mangled field and the bare name in the demangled one. However, this does not work for C++ symbols, as they already use both fields. Furthermore, currently the demangling of versioned c++ symbols fails completely as the demangler does not understand the version specifications. For the "symbolification" use case it would be best to have "_ZSt10adopt_lock@@GLIBCXX_3.4.11" as the mangled name and "std::adopt_lock@@GLIBCXX_3.4.11" as the demangled. However, for symbol resolution, we want both "_ZSt10adopt_lock" and "_ZSt10adopt_lock@@GLIBCXX_3.4.11" to resolve correctly. I can think of three ways to achieve this:
> 
> - teach Symbol class to do intelligent string matching, so that it can resolve both versioned and unversioned names. Not optimal since it would complicate the general Symbol class due to a ELF peculiarity.
> - insert two Symbol instances into the Symtab. Symbol resolution would be easy, but if we want to guarantee that we always return the versioned symbol during symbolification, we would need to do something clever there, which is again not nice.
> - allow symbols to have multiple names - again not optimal since it complicates the Symbol class, but at least the version handling could be contained in the ELF specific code - the Symbol wouldn't know about the versions, it would only know it has these 2 (or whatever) names.
> 
> As you can see, I am not exactly thrilled by any of these options. What do you think about it?

How does the linker actually fix up the libraries using these symbols to point to the right one?  In Mac OS X the equivalent task is achieved by having a symbol of type "resolver" that is actually called memcpy, and when the linker needs to call that function, it knows that it should call the resolver function, and that will return the address of the correct implementation.  So in lldb, we don't have to try to guess what the linker is going to do, we can just call this function to get the target symbol.  This can't change over the running of the program so we cache the target address of the resolver symbol.

In your case, maybe a better model is an indirect symbol - i.e. saying symbol A is an alias for symbol B.  We already have those to support MachO indirect symbols, so you could just use that type, though you might have to invent one.  That is your option 2.  Since it maps to an extant linker trick that seems to me the best way to do it.

Jim

> 
> 
> http://reviews.llvm.org/D7884
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> 
> 
> 
> _______________________________________________
> lldb-commits mailing list
> lldb-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits