[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Mon Mar 19 18:50:54 PDT 2018

I've put a WIP patch up here: https://reviews.llvm.org/D44668
Sorry for the delay!
Erik

On 2018-01-26 3:56 PM, Greg Clayton wrote:
>
>> On Jan 26, 2018, at 8:38 AM, Erik Pilkington 
>> <erik.pilkington at gmail.com <mailto:erik.pilkington at gmail.com>> wrote:
>>
>>
>>
>> On 2018-01-25 1:58 PM, Greg Clayton wrote:
>>>> On Jan 25, 2018, at 10:25 AM, Erik Pilkington 
>>>> <erik.pilkington at gmail.com <mailto:erik.pilkington at gmail.com>> wrote:
>>>>
>>>> Hi,
>>>> I'm not at all familiar with LLDB, but I've been doing some work on 
>>>> the demangler in libcxxabi. It's still a work in progress and I 
>>>> haven't yet copied the changes over to ItaniumDemangle, which AFAIK 
>>>> is what lldb uses. The demangler in libcxxabi now demangles the 
>>>> symbol you attached in 3.31 seconds, instead of 223.54 on my 
>>>> machine. I posted a RFC on my work here 
>>>> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), 
>>>> but basically the new demangler just produces an AST then traverses 
>>>> it to print the demangled name.
>>> Great to hear the huge speedup in demangling! LLDB actually has two 
>>> demanglers: a fast one that can demangle 99% of names, and we fall 
>>> back to ItaniumDemangle which can do all names but is really slow. 
>>> It would be fun to compare your new demangler with the fast one and 
>>> see if we can get rid of the fast demangler now.
>>>>
>>>> I think a good way of making this even faster is to have LLDB 
>>>> consume the AST the demangler produces directly. The AST is a 
>>>> better representation of the information that LLDB wants, and 
>>>> finishing the demangle and then fishing out that information from 
>>>> the output string is unfortunate. From the AST, it would be really 
>>>> straightforward to just individually print all the components of 
>>>> the name that LLDB wants.
>>> This would help us to grab the important bits out of the mangled 
>>> name as well. We chop up a demangled name to find the base name 
>>> (string for std::string), containing context (std:: for std::string) 
>>> and we check if we can tell if the function is a method (look for 
>>> trailing "const" modifier on the function) versus a top level 
>>> function (since the mangling doesn't fully specify what is a 
>>> namespace and what is a class (like in "foo::bar::baz()" we don't 
>>> know if "foo" or "bar" are classes or namespaces. So the AST would 
>>> be great as long as it is fast.
>>>
>>>> Most of the time it takes to demangle these "symbols from hell" is 
>>>> during the printing, after the AST has been parsed, because the 
>>>> demangler has to flatten out all the potentially nested back 
>>>> references. Just parsing to an AST should be about proportional to 
>>>> the strlen of the mangled name. Since (AFAIK) LLDB doesn't use some 
>>>> sections of the demangled name often (such as parameters), from the 
>>>> AST LLDB could lazily decide not to even bother fully demangling 
>>>> some sections of the name, then if it ever needs them it could 
>>>> parse a new AST and get them from there. I think this would largely 
>>>> fix the issue, as most of the time these crazy expansions don't 
>>>> occur in the name itself, but in the parameters or return type. 
>>>> Even when they do appear in the name, it would be possible to do 
>>>> some simple name classification (ie, does this symbol refer to a 
>>>> function) or pull out the basename quickly without expanding 
>>>> anything at all.
>>>>
>>>> Any thoughts? I'm really not at all familiar with LLDB, so I could 
>>>> have this all wrong!
>>> AST sounds great. We can put this into the class we use to chop us 
>>> C++ names as that is really our goal.
>>>
>>> So it would be great to do a speed comparison between our fast 
>>> demangler in LLDB (in FastDemangle.cpp/.h) and your updated 
>>> libcxxabi version. If yours is faster, remove FastDemangle and then 
>>> update the llvm::ItaniumDemangle() to use your new code.
>>>
>>> ASTs would be great for the C++ name parser,
>>>
>>> Let us know what you are thinking,
>>
>> Hi Greg,
>>
>> I'll almost finished with my work on the demangler, hopefully I'll be 
>> done within a few weeks. Once that's all finished I'll look into 
>> exporting the AST and comparing it to FastDemangle. I was thinking 
>> about adding a version of llvm::itaniumMangle() that returns a opaque 
>> handle to the AST and defining some functions on the LLVM side that 
>> take that handle and return some extra information. I'd be happy to 
>> help out with the LLDB side of things too, although it might be 
>> better if someone more experienced with LLDB did this.
>>
>
> Can't wait! The only reason we switched away from the libcxxabi 
> demangler in the first place was the poor performance. GDB's demangler 
> was 3x faster. Our FastDemangler made got back to the speed of the GDB 
> demangler. But it will be great to get back to one fast demangler.
>
> It would be great if there was some way to implement the demangled 
> name size cutoff in the demangler where if the detangled names goes 
> over some max size we can just stop demangling. No one needs to see a 
> 72MB string, not would anyone ever type in that name.
>
> If you can get the new demangler features (AST + demangling) into 
> llvm::itaniumMangle I will be happy to do the LLDB side of the work
>
>> I'll ping this thread when I'm finished with the demangler, then we 
>> can hopefully work out what a good API for LLDB would be.
>
> It would be great to put all the functionality into LLVM and test the 
> functionality in llvm tests. Then I will port over to LLDB as needed. 
> As Jim said, we want to know the function basename, if a function is a 
> C++ method or just a top level function or possibly both (we often 
> don't know just from mangling if foo::bar() is a method of function 
> since we don't know if "foo" is a namespace, but if we have 
> "foo::bar() const", then we know it is a method.
>
> Look forward to seeing what you come up with!
>
> Greg
>
>>
>> Thanks,
>> Erik
>>
>>> Greg
>>>
>>>> Thanks,
>>>> Erik
>>>>
>>>>
>>>> On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
>>>>> I have an issue where I am debugging a C++ binary that is around 
>>>>> 250MB in size. It contains some mangled names that are crazy:
>>>>>
>>>>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>>>>>
>>>>> This de-mangles to something that is 72MB in size and takes 280 
>>>>> seconds (try running "time c++filt -n" on the above string).
>>>>>
>>>>> There are probably many symbols likes this in this binary. 
>>>>> Currently lldb will de-mangle all names in the symbol table so 
>>>>> that we can chop up the names so we know function base names and 
>>>>> we might be able to classify a base name as a method or function 
>>>>> for breakpoint categorization.
>>>>>
>>>>> My questions is: how do we work around such issues in LLDB? A few 
>>>>> solutions I can think of:
>>>>> 1 - time each name demangle and if it takes too long somehow stop 
>>>>> de-mangling similar symbols or symbols over a certain length?
>>>>> 2 - allow a setting that says "don't de-mangle names that start 
>>>>> with..." and the setting has a list of prefixes.
>>>>> 3 - have a setting that turns off de-mangling symbols over a 
>>>>> certain length all of the time with a default of something like 
>>>>> 256 or 512
>>>>> 4 - modify our FastDemangler to abort if the de-mangled string 
>>>>> goes over a certain limit to avoid bad cases like this...
>>>>>
>>>>> #1 would still mean we get a huge delay (like 280 seconds) when 
>>>>> starting to debug this binary, but might prevent multiple symbols 
>>>>> from adding to that delay...
>>>>>
>>>>> #2 would require debugging debugging once and then knowing which 
>>>>> symbols took a while to de-mangle. If we time each de-mangle, we 
>>>>> can warn that there are large mangled names and print the mangled 
>>>>> name so the user might know?
>>>>>
>>>>> #3 would disable de-mangling of long names at the risk of not 
>>>>> de-mangling names that are close to the limit
>>>>>
>>>>> #4 requires that our FastDemangle code can decode the string 
>>>>> mangled string. The fast de-mangler currently aborts on tricky 
>>>>> de-mangling and we fall back onto cxa_demangle from the C++ 
>>>>> library which doesn't not have a cutoff on length...
>>>>>
>>>>> Can anyone else think of any other solutions?
>>>>>
>>>>> Greg Clayton
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20180319/059b3b26/attachment-0001.html>