[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Fri Jan 26 09:01:03 PST 2018

On Fri, Jan 26, 2018 at 8:38 AM, Erik Pilkington via lldb-dev
<lldb-dev at lists.llvm.org> wrote:
>
>
> On 2018-01-25 1:58 PM, Greg Clayton wrote:
>>>
>>> On Jan 25, 2018, at 10:25 AM, Erik Pilkington <erik.pilkington at gmail.com>
>>> wrote:
>>>
>>> Hi,
>>> I'm not at all familiar with LLDB, but I've been doing some work on the
>>> demangler in libcxxabi. It's still a work in progress and I haven't yet
>>> copied the changes over to ItaniumDemangle, which AFAIK is what lldb uses.
>>> The demangler in libcxxabi now demangles the symbol you attached in 3.31
>>> seconds, instead of 223.54 on my machine. I posted a RFC on my work here
>>> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but
>>> basically the new demangler just produces an AST then traverses it to print
>>> the demangled name.
>>
>> Great to hear the huge speedup in demangling! LLDB actually has two
>> demanglers: a fast one that can demangle 99% of names, and we fall back to
>> ItaniumDemangle which can do all names but is really slow. It would be fun
>> to compare your new demangler with the fast one and see if we can get rid of
>> the fast demangler now.
>>>
>>>
>>> I think a good way of making this even faster is to have LLDB consume the
>>> AST the demangler produces directly. The AST is a better representation of
>>> the information that LLDB wants, and finishing the demangle and then fishing
>>> out that information from the output string is unfortunate. From the AST, it
>>> would be really straightforward to just individually print all the
>>> components of the name that LLDB wants.
>>
>> This would help us to grab the important bits out of the mangled name as
>> well. We chop up a demangled name to find the base name (string for
>> std::string), containing context (std:: for std::string) and we check if we
>> can tell if the function is a method (look for trailing "const" modifier on
>> the function) versus a top level function (since the mangling doesn't fully
>> specify what is a namespace and what is a class (like in "foo::bar::baz()"
>> we don't know if "foo" or "bar" are classes or namespaces. So the AST would
>> be great as long as it is fast.
>>
>>> Most of the time it takes to demangle these "symbols from hell" is during
>>> the printing, after the AST has been parsed, because the demangler has to
>>> flatten out all the potentially nested back references. Just parsing to an
>>> AST should be about proportional to the strlen of the mangled name. Since
>>> (AFAIK) LLDB doesn't use some sections of the demangled name often (such as
>>> parameters), from the AST LLDB could lazily decide not to even bother fully
>>> demangling some sections of the name, then if it ever needs them it could
>>> parse a new AST and get them from there. I think this would largely fix the
>>> issue, as most of the time these crazy expansions don't occur in the name
>>> itself, but in the parameters or return type. Even when they do appear in
>>> the name, it would be possible to do some simple name classification (ie,
>>> does this symbol refer to a function) or pull out the basename quickly
>>> without expanding anything at all.
>>>
>>> Any thoughts? I'm really not at all familiar with LLDB, so I could have
>>> this all wrong!
>>
>> AST sounds great. We can put this into the class we use to chop us C++
>> names as that is really our goal.
>>
>> So it would be great to do a speed comparison between our fast demangler
>> in LLDB (in FastDemangle.cpp/.h) and your updated libcxxabi version. If
>> yours is faster, remove FastDemangle and then update the
>> llvm::ItaniumDemangle() to use your new code.
>>
>> ASTs would be great for the C++ name parser,
>>
>> Let us know what you are thinking,
>
>
> Hi Greg,
>
> I'll almost finished with my work on the demangler, hopefully I'll be done
> within a few weeks. Once that's all finished I'll look into exporting the
> AST and comparing it to FastDemangle. I was thinking about adding a version
> of llvm::itaniumMangle() that returns a opaque handle to the AST and
> defining some functions on the LLVM side that take that handle and return
> some extra information. I'd be happy to help out with the LLDB side of
> things too, although it might be better if someone more experienced with
> LLDB did this.
>

That's great to hear. Not having 3 different demanglers scattered
between lldb and llvm will be a big win for everybody.

--
Davide