[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols
Erik Pilkington via lldb-dev
lldb-dev at lists.llvm.org
Mon Mar 19 18:50:54 PDT 2018
I've put a WIP patch up here: https://reviews.llvm.org/D44668
Sorry for the delay!
Erik
On 2018-01-26 3:56 PM, Greg Clayton wrote:
>
>> On Jan 26, 2018, at 8:38 AM, Erik Pilkington
>> <erik.pilkington at gmail.com <mailto:erik.pilkington at gmail.com>> wrote:
>>
>>
>>
>> On 2018-01-25 1:58 PM, Greg Clayton wrote:
>>>> On Jan 25, 2018, at 10:25 AM, Erik Pilkington
>>>> <erik.pilkington at gmail.com <mailto:erik.pilkington at gmail.com>> wrote:
>>>>
>>>> Hi,
>>>> I'm not at all familiar with LLDB, but I've been doing some work on
>>>> the demangler in libcxxabi. It's still a work in progress and I
>>>> haven't yet copied the changes over to ItaniumDemangle, which AFAIK
>>>> is what lldb uses. The demangler in libcxxabi now demangles the
>>>> symbol you attached in 3.31 seconds, instead of 223.54 on my
>>>> machine. I posted a RFC on my work here
>>>> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html),
>>>> but basically the new demangler just produces an AST then traverses
>>>> it to print the demangled name.
>>> Great to hear the huge speedup in demangling! LLDB actually has two
>>> demanglers: a fast one that can demangle 99% of names, and we fall
>>> back to ItaniumDemangle which can do all names but is really slow.
>>> It would be fun to compare your new demangler with the fast one and
>>> see if we can get rid of the fast demangler now.
>>>>
>>>> I think a good way of making this even faster is to have LLDB
>>>> consume the AST the demangler produces directly. The AST is a
>>>> better representation of the information that LLDB wants, and
>>>> finishing the demangle and then fishing out that information from
>>>> the output string is unfortunate. From the AST, it would be really
>>>> straightforward to just individually print all the components of
>>>> the name that LLDB wants.
>>> This would help us to grab the important bits out of the mangled
>>> name as well. We chop up a demangled name to find the base name
>>> (string for std::string), containing context (std:: for std::string)
>>> and we check if we can tell if the function is a method (look for
>>> trailing "const" modifier on the function) versus a top level
>>> function (since the mangling doesn't fully specify what is a
>>> namespace and what is a class (like in "foo::bar::baz()" we don't
>>> know if "foo" or "bar" are classes or namespaces. So the AST would
>>> be great as long as it is fast.
>>>
>>>> Most of the time it takes to demangle these "symbols from hell" is
>>>> during the printing, after the AST has been parsed, because the
>>>> demangler has to flatten out all the potentially nested back
>>>> references. Just parsing to an AST should be about proportional to
>>>> the strlen of the mangled name. Since (AFAIK) LLDB doesn't use some
>>>> sections of the demangled name often (such as parameters), from the
>>>> AST LLDB could lazily decide not to even bother fully demangling
>>>> some sections of the name, then if it ever needs them it could
>>>> parse a new AST and get them from there. I think this would largely
>>>> fix the issue, as most of the time these crazy expansions don't
>>>> occur in the name itself, but in the parameters or return type.
>>>> Even when they do appear in the name, it would be possible to do
>>>> some simple name classification (ie, does this symbol refer to a
>>>> function) or pull out the basename quickly without expanding
>>>> anything at all.
>>>>
>>>> Any thoughts? I'm really not at all familiar with LLDB, so I could
>>>> have this all wrong!
>>> AST sounds great. We can put this into the class we use to chop us
>>> C++ names as that is really our goal.
>>>
>>> So it would be great to do a speed comparison between our fast
>>> demangler in LLDB (in FastDemangle.cpp/.h) and your updated
>>> libcxxabi version. If yours is faster, remove FastDemangle and then
>>> update the llvm::ItaniumDemangle() to use your new code.
>>>
>>> ASTs would be great for the C++ name parser,
>>>
>>> Let us know what you are thinking,
>>
>> Hi Greg,
>>
>> I'll almost finished with my work on the demangler, hopefully I'll be
>> done within a few weeks. Once that's all finished I'll look into
>> exporting the AST and comparing it to FastDemangle. I was thinking
>> about adding a version of llvm::itaniumMangle() that returns a opaque
>> handle to the AST and defining some functions on the LLVM side that
>> take that handle and return some extra information. I'd be happy to
>> help out with the LLDB side of things too, although it might be
>> better if someone more experienced with LLDB did this.
>>
>
> Can't wait! The only reason we switched away from the libcxxabi
> demangler in the first place was the poor performance. GDB's demangler
> was 3x faster. Our FastDemangler made got back to the speed of the GDB
> demangler. But it will be great to get back to one fast demangler.
>
> It would be great if there was some way to implement the demangled
> name size cutoff in the demangler where if the detangled names goes
> over some max size we can just stop demangling. No one needs to see a
> 72MB string, not would anyone ever type in that name.
>
> If you can get the new demangler features (AST + demangling) into
> llvm::itaniumMangle I will be happy to do the LLDB side of the work
>
>> I'll ping this thread when I'm finished with the demangler, then we
>> can hopefully work out what a good API for LLDB would be.
>
> It would be great to put all the functionality into LLVM and test the
> functionality in llvm tests. Then I will port over to LLDB as needed.
> As Jim said, we want to know the function basename, if a function is a
> C++ method or just a top level function or possibly both (we often
> don't know just from mangling if foo::bar() is a method of function
> since we don't know if "foo" is a namespace, but if we have
> "foo::bar() const", then we know it is a method.
>
> Look forward to seeing what you come up with!
>
> Greg
>
>>
>> Thanks,
>> Erik
>>
>>> Greg
>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>> On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
>>>>> I have an issue where I am debugging a C++ binary that is around
>>>>> 250MB in size. It contains some mangled names that are crazy:
>>>>>
>>>>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>>>>>
>>>>> This de-mangles to something that is 72MB in size and takes 280
>>>>> seconds (try running "time c++filt -n" on the above string).
>>>>>
>>>>> There are probably many symbols likes this in this binary.
>>>>> Currently lldb will de-mangle all names in the symbol table so
>>>>> that we can chop up the names so we know function base names and
>>>>> we might be able to classify a base name as a method or function
>>>>> for breakpoint categorization.
>>>>>
>>>>> My questions is: how do we work around such issues in LLDB? A few
>>>>> solutions I can think of:
>>>>> 1 - time each name demangle and if it takes too long somehow stop
>>>>> de-mangling similar symbols or symbols over a certain length?
>>>>> 2 - allow a setting that says "don't de-mangle names that start
>>>>> with..." and the setting has a list of prefixes.
>>>>> 3 - have a setting that turns off de-mangling symbols over a
>>>>> certain length all of the time with a default of something like
>>>>> 256 or 512
>>>>> 4 - modify our FastDemangler to abort if the de-mangled string
>>>>> goes over a certain limit to avoid bad cases like this...
>>>>>
>>>>> #1 would still mean we get a huge delay (like 280 seconds) when
>>>>> starting to debug this binary, but might prevent multiple symbols
>>>>> from adding to that delay...
>>>>>
>>>>> #2 would require debugging debugging once and then knowing which
>>>>> symbols took a while to de-mangle. If we time each de-mangle, we
>>>>> can warn that there are large mangled names and print the mangled
>>>>> name so the user might know?
>>>>>
>>>>> #3 would disable de-mangling of long names at the risk of not
>>>>> de-mangling names that are close to the limit
>>>>>
>>>>> #4 requires that our FastDemangle code can decode the string
>>>>> mangled string. The fast de-mangler currently aborts on tricky
>>>>> de-mangling and we fall back onto cxa_demangle from the C++
>>>>> library which doesn't not have a cutoff on length...
>>>>>
>>>>> Can anyone else think of any other solutions?
>>>>>
>>>>> Greg Clayton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20180319/059b3b26/attachment-0001.html>
More information about the lldb-dev
mailing list