[Lldb-commits] [PATCH] Strip ELF symbol versions from symbol names

Thu Feb 26 09:02:26 PST 2015

So here is me thinking out loud about this issue...

What are the current use cases for the Symbols and SymTabs in lldb?

- symbolification (aka looking up a symbol by address): In this case we would probably want to output "memcpy@@GLIBC_2.14" because that _is_ the name of the symbol in the object file and it also provides the most information.

- symbol resolution (aka looking up a symbol by name): in which situations do we need to do this? Currently, I am aware of only one: user provided expressions in the "expr" command. Are there any other use cases?
  - the ELF versioning spec says that when we do not have any additional information, we should pick the default (latest) version. This is the one with @@ in it's name. When user types "expr memcpy(a,b,c)", we do not have any information, so the string "memcpy" should resolve to the same address as "memcpy@@GLIBC_2.14". We could try to be clever and figure out what version is used in the rest of the code, but that may prove to be quite difficult. Furthermore, we almost definitely want the expression `char foo[]="bar"; do_something_with(foo)` (which compiles to something involving memcpy), to use the default symbol version, since the user is probably not even aware that there is a call to memcpy involved (I certainly wasn't).
  - we would like to keep the non-default symbol versions (e.g. "memcpy at GLIBC_2.2.5"), so that we can do symbolification, but we don't want "memcpy" to resolve to these symbols unless the user explicitly specifies "memcpy at GLIBC_2.2.5" (which right now he can't as the expr command will bark out a syntax error. It might be possible to call the function by embedding the right asm commands in the expr expression, but I do not care about this right now.

So how do we achieve this? For C symbols we can store the full symbol name in the mangled field and the bare name in the demangled one. However, this does not work for C++ symbols, as they already use both fields. Furthermore, currently the demangling of versioned c++ symbols fails completely as the demangler does not understand the version specifications. For the "symbolification" use case it would be best to have "_ZSt10adopt_lock@@GLIBCXX_3.4.11" as the mangled name and "std::adopt_lock@@GLIBCXX_3.4.11" as the demangled. However, for symbol resolution, we want both "_ZSt10adopt_lock" and "_ZSt10adopt_lock@@GLIBCXX_3.4.11" to resolve correctly. I can think of three ways to achieve this:

- teach Symbol class to do intelligent string matching, so that it can resolve both versioned and unversioned names. Not optimal since it would complicate the general Symbol class due to a ELF peculiarity.
- insert two Symbol instances into the Symtab. Symbol resolution would be easy, but if we want to guarantee that we always return the versioned symbol during symbolification, we would need to do something clever there, which is again not nice.
- allow symbols to have multiple names - again not optimal since it complicates the Symbol class, but at least the version handling could be contained in the ELF specific code - the Symbol wouldn't know about the versions, it would only know it has these 2 (or whatever) names.

As you can see, I am not exactly thrilled by any of these options. What do you think about it?

http://reviews.llvm.org/D7884

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/