[lldb-dev] What should SymbolFile::FindFunctions(..., eFunctionNameTypeFull, ...) do ?

Fri May 4 09:31:32 PDT 2018

> On May 4, 2018, at 4:58 AM, Pavel Labath <labath at google.com> wrote:
> 
> Thank you for the detailed response. My replies are below.
> 
> 
> On Thu, 3 May 2018 at 16:53, Greg Clayton <clayborg at gmail.com> wrote:
>> On May 3, 2018, at 7:38 AM, Pavel Labath via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
>> - for the manual case (SymbolFileDWARF.cpp:2626), the function will look
>> into a special "full name" index, which contains mangled and
>> fully-qualified (with parameters and all) demangled names of all
> functions.
>> This would seem reasonable if it was not followed by a
> hac^H^H^Hworkaround,
>> which will, in case the previous search finds no match, look in the
>> *basename* index, and then accept any function whose demangled name begins
>> with the string "(anonymous namespace)" (which means it will include also
>> functions with mismatched arguments types or namespace qualifiers).
> 
> 
>> So I would have the ::Index method skip creating the fully qualified
> names and populating an index based off of that because:
>> 1 - it is expensive to create the qualified names and no one looks them
> up that way.
>> 2 - the qualified names would need to ensure that they exactly match what
> the demangler would do if a mangled name were actually there
> Agreed. I'd like to get rid of that as well.
> 
> 
> 
> 
>> So, what should be the correct behavior here? Both of these seem so wrong
>> (and different) that I am surprised that we did not run into issues here
>> before. Is it possible there is some additional filtering happening at a
>> different level that I am not aware of?
> 
> 
>> So I believe the best way to proceed is the way the apple tables do
> things. Expect that lookups will happen on base names and filtering will
> happen elsewhere. This keeps the support needed for indexing to an
> acceptable minimum for all debug info formats and will still allow people
> to look things up.
> 
> Great, I was hoping you would propose that. :) Getting rid of the demangled
> names should save us some memory and processing time, and will align the
> manual-index behavior with the apple tables then any kind of post-filtering
> we need to do can be done the same way regardless of how we obtained the
> unfiltered list. I'll create a patch for that.
> 

Sounds good!

> 
> 
>> PS: I tried adding assert(!name.contains("::")) into this function to see
>> how often we are calling this function with a "FQN" which is not simply a
>> basename. (This is interesting because for the apple case, this function
>> would always return 0 results for such queries.) Only 5 tests failed, and
>> in all of these, the asserting calls were originating from
>> IRExecutionUnit.cpp, which was trying to link the jitted expression with
>> the running program. What this code is doing is it takes a mangled name,
>> and then tries searching for functions with several names derived from it
>> (demangled names, the "const" version of the mangled name, etc.). So it
>> seems, that in this case, the value returned by FindFunctions for the
>> non-demangled names doesn't really matter, since IRExecutionUnit will just
>> retry the search with the mangled name, which will return the same answer
>> in both cases.
> 
> 
>> I agree that for the expression parser what you noticed with the assert
> is ok. The real issue is when people don't fully qualify names they type
> (such as with "b::foo" mentioned above), and that means function
> breakpoints by name are the other main searching factor here.
> 
> Is it really OK? If our indexes will never contain the demangled names,
> then the IRExecutionUnit lookups using the demangled names will always
> fail. (Right now they will only succeed for manually indexed dwarf, but
> this will change if I stop putting these names in the full index) Shouldn't
> we fix the IRExecutionUnit to not attempt these lookups in the first place?

Yikes. I wasn't aware this was happening. What is the flow here? It tries to lookup using a mangled name first and if that fails, then it tries to demangle the name and then look that up? Are there cases where we don't have mangled names in the debug info, yet we are able to construct a fully qualified name and look that up? 

It would be great if we don't need these fully qualified name lookups, but if we do, we could fix this by looking up using the basename, then filtering the results of the lookup to only those that match in the IR code.

> 
> 
> 
>> So things we need to think about:
>> - do we make SymbolFile interfaces simpler and make the filtering logic
> at a high level, or do we increase the complexity of these searches so they
> return fewer results and have each SymbolFile::FindXXXX() function more
> complex.
>> - do we agnostify the CompilerDeclContext so it works on any type system,
> or require the find function that searches multiple modules have to create
> or lookup a valid CompilerDeclContext prior to calling into each
> SymbolFile::FindXXX() call?
> 
> These are very good questions, but I am afraid I don't know enough about
> this part of the codebase to say what would be best. I think that a
> declarative method of specifying the context would be better than a
> callback-based, because then the search can be optimized better (e.g. for
> DWARF 5 indexes, knowing the offset of the compile unit we are searching
> can make the searches much faster, but I'm not sure about the details. It
> looks like I will be digging in this part of the code for a while now, so I
> may get a better idea of how it is used...

Sounds good. I know this area very well and am happy to help explain and get you up to speed. We put a lot of thought into the Apple accelerator tables and they have been working very well for us, so I will be happy to see these related DWARF 5 accelerator tables being added. 

BTW: it would be great if we modify llvm-dsymutil to generate these tables and to also handle ELF. llvm-dsymutil has a "--update" option that takes an existing DWARF file and adds the apple accelerator tables to existing DWARF. Having a tool that can add the DWARF 5 accelerator tables would be great for older compilers that don't generate these tables, but can have them generated. Another idea would be to be able to generate these accelerator tables on the side and cache them somewhere. LLDB could determine that a DWARF file doesn't have them, and generate them on the fly and cache them. We already basically generate these indexes in memory when calling ::Index(...), so why not just generate them into stand alone side files and cache and use them. This would speed up subsequent debug sessions in new processes since we can just load the cached file. Then we only have to support one indexing format. The LLVM code built into LLDB has all the code it needs to generate this info? Thoughts?

Greg Clayton