[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Fri Jan 26 11:26:29 PST 2018

It's not just reduction of the number of demanglers we have to support, however.  

Greg and I both got excited by this proposal because we've had to maintain these name choppers for the tasks lldb has to do with mangled names - for instance matching incomplete human-typed in names - i.e. Class::method to match a method which is actually Namespace::Class::method.  

Having a structured representation of mangled names will be much more appropriate for the tasks lldb has to do with mangled names - for instance matching incomplete human-typed in names - i.e. Class::method to match a method which is actually Namespace::Class::method.  At present, we end up losing all the semantic information the demangler had when it parsed the mangled name, then trying to recreate that by hand to pick out the pieces of interest.

Greg did an experiment early on in lldb of having a node tree representation of mangled names, but it was too slow when you have to use it on every symbol in a module.  That's an important thing to remember for the debugger's use of the demangler.  Since we need to quickly find Namespace::Class::method when a somebody types Class::method we have to build up lookup tables up front for those pieces, and we don't always have debug information from which to grab the base name.  So whatever demangler we use has to survive getting passed all the C++ symbols in the libraries loaded by a normal program.

Another bonus of this work: we have the problem that a 700 character demangled name is just not useful in a backtrace.  If you have 20 frames of this one after another the display is really just noise...  We do some truncation of names, but figuring out how to truncate a name while preserving the parts that are actually useful to people is hard to do well if you don't understand the semantics of the name.  Erik Eckstein added a "display mode" to the swift demangler which only renders the most salient parts of the name.  The swift demangler does parse into a node tree so this was doable.  That made a big difference in the readability of backtraces in swift.  This is plumbed through the generic parts of lldb (Symbol::GetDisplayName & Mangled::GetDisplayDeangledName) but for C++ GetDisplayDemangledName just calls GetDemangled name.  It would be great to implement some reasonable version of this for C++ names as well.

Jim

> On Jan 26, 2018, at 9:01 AM, Davide Italiano via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> 
> On Fri, Jan 26, 2018 at 8:38 AM, Erik Pilkington via lldb-dev
> <lldb-dev at lists.llvm.org> wrote:
>> 
>> 
>> On 2018-01-25 1:58 PM, Greg Clayton wrote:
>>>> 
>>>> On Jan 25, 2018, at 10:25 AM, Erik Pilkington <erik.pilkington at gmail.com>
>>>> wrote:
>>>> 
>>>> Hi,
>>>> I'm not at all familiar with LLDB, but I've been doing some work on the
>>>> demangler in libcxxabi. It's still a work in progress and I haven't yet
>>>> copied the changes over to ItaniumDemangle, which AFAIK is what lldb uses.
>>>> The demangler in libcxxabi now demangles the symbol you attached in 3.31
>>>> seconds, instead of 223.54 on my machine. I posted a RFC on my work here
>>>> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but
>>>> basically the new demangler just produces an AST then traverses it to print
>>>> the demangled name.
>>> 
>>> Great to hear the huge speedup in demangling! LLDB actually has two
>>> demanglers: a fast one that can demangle 99% of names, and we fall back to
>>> ItaniumDemangle which can do all names but is really slow. It would be fun
>>> to compare your new demangler with the fast one and see if we can get rid of
>>> the fast demangler now.
>>>> 
>>>> 
>>>> I think a good way of making this even faster is to have LLDB consume the
>>>> AST the demangler produces directly. The AST is a better representation of
>>>> the information that LLDB wants, and finishing the demangle and then fishing
>>>> out that information from the output string is unfortunate. From the AST, it
>>>> would be really straightforward to just individually print all the
>>>> components of the name that LLDB wants.
>>> 
>>> This would help us to grab the important bits out of the mangled name as
>>> well. We chop up a demangled name to find the base name (string for
>>> std::string), containing context (std:: for std::string) and we check if we
>>> can tell if the function is a method (look for trailing "const" modifier on
>>> the function) versus a top level function (since the mangling doesn't fully
>>> specify what is a namespace and what is a class (like in "foo::bar::baz()"
>>> we don't know if "foo" or "bar" are classes or namespaces. So the AST would
>>> be great as long as it is fast.
>>> 
>>>> Most of the time it takes to demangle these "symbols from hell" is during
>>>> the printing, after the AST has been parsed, because the demangler has to
>>>> flatten out all the potentially nested back references. Just parsing to an
>>>> AST should be about proportional to the strlen of the mangled name. Since
>>>> (AFAIK) LLDB doesn't use some sections of the demangled name often (such as
>>>> parameters), from the AST LLDB could lazily decide not to even bother fully
>>>> demangling some sections of the name, then if it ever needs them it could
>>>> parse a new AST and get them from there. I think this would largely fix the
>>>> issue, as most of the time these crazy expansions don't occur in the name
>>>> itself, but in the parameters or return type. Even when they do appear in
>>>> the name, it would be possible to do some simple name classification (ie,
>>>> does this symbol refer to a function) or pull out the basename quickly
>>>> without expanding anything at all.
>>>> 
>>>> Any thoughts? I'm really not at all familiar with LLDB, so I could have
>>>> this all wrong!
>>> 
>>> AST sounds great. We can put this into the class we use to chop us C++
>>> names as that is really our goal.
>>> 
>>> So it would be great to do a speed comparison between our fast demangler
>>> in LLDB (in FastDemangle.cpp/.h) and your updated libcxxabi version. If
>>> yours is faster, remove FastDemangle and then update the
>>> llvm::ItaniumDemangle() to use your new code.
>>> 
>>> ASTs would be great for the C++ name parser,
>>> 
>>> Let us know what you are thinking,
>> 
>> 
>> Hi Greg,
>> 
>> I'll almost finished with my work on the demangler, hopefully I'll be done
>> within a few weeks. Once that's all finished I'll look into exporting the
>> AST and comparing it to FastDemangle. I was thinking about adding a version
>> of llvm::itaniumMangle() that returns a opaque handle to the AST and
>> defining some functions on the LLVM side that take that handle and return
>> some extra information. I'd be happy to help out with the LLDB side of
>> things too, although it might be better if someone more experienced with
>> LLDB did this.
>> 
> 
> That's great to hear. Not having 3 different demanglers scattered
> between lldb and llvm will be a big win for everybody.
> 
> --
> Davide
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev