[lldb-dev] What should SymbolFile::FindFunctions(..., eFunctionNameTypeFull, ...) do ?

Greg Clayton via lldb-dev lldb-dev at lists.llvm.org
Thu May 3 08:53:25 PDT 2018

> On May 3, 2018, at 7:38 AM, Pavel Labath via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> Hello all,
> /This may or may not be related to the "Issue with expression parser
> finding the correct types" thread from last week./
> As you probably know, I am working on adding DWARF v5 accelerator tables to
> lldb. To make that happen, I've been trying to understand how the two
> existing indexing methods (apple tables and manually built indexes) work.
> While doing that, I was puzzled at the behavior of the FindFunctions method
> for the case of name_type_mask = eFunctionNameTypeFull. The function seems
> to do very different things depending on the index used (and both of them
> seems to be broken in different ways).
> - for the apple case (SymbolFileDWARF.cpp:2496), the function will take the
> name given, and look it up in the apple_names table (which contains base
> names, mangled names, but *not* fully qualified demangled names). Then it
> will simply return this list as a result (I am assuming we pass an empty
> parent_decl_ctx, so no filtering done there). This seems weird, because
> then a search for a name like "foo" will return any function whose base
> name is "foo", including ones deeply nested inside other namespaces,
> classes or other functions.

DWARF currently only has mandatory function names that are the base function names with no decoration or qualification, and may or may not have the mangled names. I didn't wan't to require every mangled name to be demangled and placed in a collection of fully qualified demangled names as no one really searches using those names because if you get one space wrong, or mess anything up the lookups don't work. So the idea was:
1 - allow mangled name lookups for the compiler since JIT will lookup via the mangled names.
2 - allow lookup on basename and filter the results as needed

So the current Apple approach is the way it is intended to work. That isn't to say that this is what the right thing to do going forward is. More on that below.

> - for the manual case (SymbolFileDWARF.cpp:2626), the function will look
> into a special "full name" index, which contains mangled and
> fully-qualified (with parameters and all) demangled names of all functions.
> This would seem reasonable if it was not followed by a hac^H^H^Hworkaround,
> which will, in case the previous search finds no match, look in the
> *basename* index, and then accept any function whose demangled name begins
> with the string "(anonymous namespace)" (which means it will include also
> functions with mismatched arguments types or namespace qualifiers).

So I would have the ::Index method skip creating the fully qualified names and populating an index based off of that because:
1 - it is expensive to create the qualified names and no one looks them up that way.
2 - the qualified names would need to ensure that they exactly match what the demangler would do if a mangled name were actually there

> So, what should be the correct behavior here? Both of these seem so wrong
> (and different) that I am surprised that we did not run into issues here
> before. Is it possible there is some additional filtering happening at a
> different level that I am not aware of?

So I believe the best way to proceed is the way the apple tables do things. Expect that lookups will happen on base names and filtering will happen elsewhere. This keeps the support needed for indexing to an acceptable minimum for all debug info formats and will still allow people to look things up.

The way lookups happen right now can also be a relative match. Lets say you have a:

namespace a {
namespace b {
  int foo();

If someone types "b b::foo", this currently works because the place that does the matching knows to split the string into a search for functions that have "foo", then filter the results out that don't have a "b::foo" in the matching fully qualified name. If the user typed "a::b::foo ()" then matching on qualified name would fail due to the extra space. If the function was const and we searched for "a::b::foo()const" that would fail due to the missing space before const. So my reasoning was to avoid having to normalize names because they don't match exactly. Mangled names have a predefined format that is specified exactly. Any other qualified names are not normalized so they become unwieldy to use for matches IMHO.

> PS: I tried adding assert(!name.contains("::")) into this function to see
> how often we are calling this function with a "FQN" which is not simply a
> basename. (This is interesting because for the apple case, this function
> would always return 0 results for such queries.) Only 5 tests failed, and
> in all of these, the asserting calls were originating from
> IRExecutionUnit.cpp, which was trying to link the jitted expression with
> the running program. What this code is doing is it takes a mangled name,
> and then tries searching for functions with several names derived from it
> (demangled names, the "const" version of the mangled name, etc.). So it
> seems, that in this case, the value returned by FindFunctions for the
> non-demangled names doesn't really matter, since IRExecutionUnit will just
> retry the search with the mangled name, which will return the same answer
> in both cases.

I agree that for the expression parser what you noticed with the assert is ok. The real issue is when people don't fully qualify names they type (such as with "b::foo" mentioned above), and that means function breakpoints by name are the other main searching factor here.

A few interesting points in the evolution of this approach:

We used to have a client that was setting a breakpoint on a symbol function name "foo". This worked fine until he made his source file C++ and tried to use the symbol only, without debug info, so now the symbol name was "foo()" because that is what it demangled to. His breakpoint no longer worked. So we made the symbol table parsing code demangle symbols and find the basename of the functions so the symbol tables can implement the same searching available in the DWARF.

So the current reasoning is:
1 - allow basename searches only so things are easy on the SymbolFile subclasses
2 - allow fully mangled name searches
3 - don't do fully qualified indexes because these are fraught with syntax issues regarding spacing and equivalency issues ("foo ()" == "foo()" etc).

Things that could be improved
1 - all queries for finding functions/types/globals should take a SymbolContext that specifies the context that the lookup should be originating from. 
2 - Don't pass a "CompilerDeclContext *parent_decl_context" since this requires that the "TypeSystem *" inside the CompilerDeclContext matches that being used by the Module/ObjectFile/SymbolFile. Pass in a tokenized version that can represent all languages. So you would say "I am looking for 'foo" in AbstractDeclContext that is in the translation unit.

AbstractDeclContext would be something like:

struct AbstractDeclContext {
  enum KindFlag {
    WildCard        = 1u << 0,
    TranslationUnit = 1u << 1,
    Namespace       = 1u << 2,
    Class           = 1u << 3,
    Struct          = 1u << 4,
    Function        = 1u << 5,
    Anything        = ~WildCard, // anything except wildcard
  struct Context {
     uint32_t kind_mask;
     ConstString name;
  LanguageType m_language;
  std::vector<Context> m_context;

Before making a search you would create an AbstractDeclContext:

For a type at the top level we would make:

AbstractDeclContext context(eLanguageTypeC, {
  {AbstractDeclContext::TranslationUnit, ""}

We don't need to specify the translation unit name unless we explicitly need the result to come only from a specific source file, in which case we could specify the name:

AbstractDeclContext context(eLanguageTypeC, {
  {AbstractDeclContext::TranslationUnit, "/tmp/main.c"}

Searching for "basic_string" inside the "std::__1" namespaces we would search for "basic_string" with the context:

AbstractDeclContext context(eLanguageTypeC, {
  {AbstractDeclContext::Namespace, "std"}, 
  {AbstractDeclContext::Namespace, "__1"}

For a user typed lookup like "a::b::foo" we would look for "foo" and specify:

AbstractDeclContext context(eLanguageTypeC, {
  {AbstractDeclContext::WildCard, "a"}, 
  {AbstractDeclContext::Anything, "b"}});

because we don't know if "a" or "b" is a namespace or class/struct, function. For "a" we specify WildCard, because we don't care what the parent context of "a" is. It isn't required to be at the root level.

3 - An alternative to #2 is to have all FindFunction, FindTypes and FindGlobalVariables functions take a lambda that gets called for any matching results as they are found. The lambda that handles the matches would have a boolean that would say when to stop searching for more matches if we found what we are looking for. Why? Currently we have a "uint32_t max_matches" on some of the ::Find functions. This is bad because if we do a general lookup for "foo", we might ask for 1 result, but if we don't properly specify the context, then we might get one and only one incorrect results back. Some searches like "reference_type" are typedefs that are found inside of every STL class. This could result in all debug info being parsed for all STL classes if we do a broad search.

So things we need to think about:
- do we make SymbolFile interfaces simpler and make the filtering logic at a high level, or do we increase the complexity of these searches so they return fewer results and have each SymbolFile::FindXXXX() function more complex.
- do we agnostify the CompilerDeclContext so it works on any type system, or require the find function that searches multiple modules have to create or lookup a valid CompilerDeclContext prior to calling into each SymbolFile::FindXXX() call?

Greg Clayton

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20180503/40a68b5f/attachment-0001.html>

More information about the lldb-dev mailing list