[lldb-dev] LLDB Demangling

Stefan Gränitz via lldb-dev lldb-dev at lists.llvm.org
Thu Jul 26 02:30:14 PDT 2018


One more short question, as you seem to be much more familiar with the
code than me. Maybe you have an idea?

In the unit test I prepared, I didn't manage to cover this line in
Symtab::InitNameIndexes():
https://github.com/llvm-mirror/lldb/blob/master/source/Symbol/Symtab.cpp#L348

With both demangle implementations, the old one and IPD, annotations
cause demangling to fail/return empty name. I assume this branch has its
right to exist and I would just keep it, but it would also be great to
find an explanation and a repro to cover it.

Am 25.07.18 um 20:15 schrieb Stefan Gränitz:
> Hi Pavel, thanks for your feedback!
>
>> I would propose to create a new class (RichMangingInfo?), which would
>> wrap both ItaniumPartialDemangler and the existing string-chopping
>> implementation, and provide a unified interface on top of them.
> Right, good point. Adding an abstraction like this certainly makes sense.
>
>> we might be able
>> to get further speedup by having DemangleWithRichNameIndexInfo *not*
>> fill out the demangled string (as, I understand, building the actual
>> string is the most expensive operation), and just fetching the bits of
>> info we need from the IPD.
> Yes, if it turns out we really don't need the demangled names in a table
> and instead just render them on-demand, I guess we'd gain additional
> speedup plus reduction in memory usage. I will have a closer look on this.
>
>> right now the Symtab class builds a table containing all
>> demangled names. This is mostly unnecessary, and
> What exact table are you referring to here? The m_name_to_index map does
> store both, demangled and mangled strings. Not sure to what extend the
> demangled ones are necessary. It's a little hard to judge, but I can try
> and check. Or do you mean a different one?
>
> There is one more follow-up task here: Now that FastDemangle isn't used
> anymore in Mangled::GetDemangledName(), its last remaining usage is
> SubsPrimitiveParmItanium() in CPlusPlusLanguage.cpp. I will try to
> figure out if we still need it or if we can switch it to the IPD too.
> Then we could consider to get rid of FastMangle altogether.
>
> Any objections here?
>
>
> Am 25.07.18 um 13:11 schrieb Pavel Labath:
>> Hello Stefan,
>>
>> thank you for working on this. I believe this work will bring a huge
>> improvement to LLDB.
>>
>> Adding something like DemangleWithRichNameIndexInfo to the Mangled
>> class makes sense to me. However, I don't think that function should
>> be taking an ItaniumPartialDemangler as an argument. The reason for
>> that is that ItaniumPartialDemangler is, unsurprisingly, specific to
>> the itanium mangling scheme, and the Mangled class tries to hide the
>> specifics of different manglings.
>>
>> I would propose to create a new class (RichMangingInfo?), which would
>> wrap both ItaniumPartialDemangler and the existing string-chopping
>> implementation, and provide a unified interface on top of them.
>> Internally it would use IPD for the itanium scheme and the legacy
>> implementation for the MSVC mangling, but the user would not have to
>> care about that, as it can just ask questions like "is this a
>> constructor?" and get the correct answer either way. If ever we get
>> something similar to IPD for the MSVC mangling scheme, we can just
>> replace the legacy implementation with that one, and nobody should
>> know the difference.
>>
>> What do you think?
>>
>>
>> BTW, right now the Symtab class builds a table containing all
>> demangled names. This is mostly unnecessary, and in fact we have
>> recently removed a similar table for the demangled debug_info names.
>> The only present use for it that I am aware of is papering over a
>> clang "bug"*. If we are able to get rid of this, then we might be able
>> to get further speedup by having DemangleWithRichNameIndexInfo *not*
>> fill out the demangled string (as, I understand, building the actual
>> string is the most expensive operation), and just fetching the bits of
>> info we need from the IPD.
>>
>> (*) The "bug" is that sometimes clang will not generate the C1 (full
>> object) flavour of an constructor if it is identical to the C2 version
>> (even though the CU in question definitely constructs full objects of
>> the given class). This means that we are not able able to construct
>> some objects during expression evaluation as we cannot find the C1
>> version anywhere. The demangling makes this work accidentally, as were
>> are able to match up the constructors by demangled names, as they both
>> demangle to the same string. I have recently fixed clang to emit C1
>> always (at -O0), but the question remains on what to do with older
>> compilers.
>>
>>
>> On Tue, 24 Jul 2018 at 21:55, Stefan Gränitz <stefan.graenitz at gmail.com> wrote:
>>> Hello everyone
>>>
>>> I am relatively new to the LLDB sources and just submitted my first own
>>> patch for review in Phabricator. Based on this patch I would like to
>>> discuss a few details for further improvements on LLDB's demangling.
>>>
>>> First a short recap on the current state:
>>> * Name demangling used to be a lazy process, exclusively accessible via
>>> Mangled::GetDemangledName() - this is a valuable mechanism:
>>> https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Core/Mangled.cpp#L252
>>> * My patch wants to replace the existing combination of FastDemangle &
>>> itaniumDemangle() with LLVM's new ItaniumPartialDemangler (IPD)
>>> implementation and no fallbacks anymore. It slightly reduces complexity
>>> and slightly improves performance, but doesn't introduce conceptual
>>> changes: https://reviews.llvm.org/D49612
>>> * IPD provides rich information on names, e.g. isFuntion() or
>>> isCtorOrDtor(), but stores that in its own state rather than returning a
>>> queriable object:
>>> https://github.com/llvm-mirror/llvm/blob/a3de0cbb8f4d886a968d20a8c6a6e8aa01d28c2a/include/llvm/Demangle/Demangle.h#L36
>>> * IPD's rich info could help LLDB, where it currently parses mangled
>>> names on its own, on-top of demangling. Symtab::InitNameIndexes() seems
>>> to be the most prominent such place. LLDB builds an index with various
>>> categories from all its symbols here. This is performance-critical and
>>> it does not benefit from the laziness in GetDemangledName():
>>> https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Symbol/Symtab.cpp#L218
>>>
>>> My simple switch doesn't exploit IPD's rich demangling info yet and it
>>> uses a new IPD instance for each demangling request, which is considered
>>> quite costly as it uses a bump allocator internally. Over-all
>>> performance still didn't drop, but even seems to benefit.
>>>
>>> In order to fully exploit the remaining potential, I am thinking about
>>> the following changes:
>>>
>>> (1) In the Mangled class, add a separate entry-point for batch
>>> demangling, that allows to pass in an existing IPD:
>>> bool Mangled::DemangleWithRichNameIndexInfo(ItaniumPartialDemangler &IPD);
>>>
>>> (2) DemangleWithRichNameIndexInfo() will demangle explicitly, which is
>>> required to make sure we gather IPD's rich info. It's not lazy as
>>> GetDemangledName(), but it will store the demangled name and set the
>>> "MangledCounterpart" so that subsequent lazy requests will be fast.
>>>
>>> (3) DemangleWithRichNameIndexInfo() will be used by
>>> Symtab::InitNameIndexes(), which will have a single IPD instance that is
>>> reused for all symbols. Symtab::InitNameIndexes() is usually called
>>> before anything else, so it is basically "warming the cache" here.
>>>
>>> (4) Finally, with IPD's rich info, we can get rid of the additional
>>> string parsing in Symtab::InitNameIndexes(). I expect a considerable
>>> speedup here too.
>>>
>>> What do you think about the plan?
>>> Do you think it's a good idea to add DemangleWithRichNameIndexInfo()
>>> like this?
>>> Are you aware of more batch-processing places like
>>> Symtab::InitNameIndexes(), that I should consider as clients for
>>> DemangleWithRichNameIndexInfo()?
>>> Do you know potential side-effects I must be aware of?
>>> Would you consider the evidence on the performance benefits convincing,
>>> or do you think it needs bulletproof benchmarking numbers?
>>>
>>> When it comes to MSVC-mangled names:
>>> * It is certainly necessary to keep a legacy version of the current
>>> categorization mechanism for these. But in general, what do you think
>>> about their importance for LLDB? (Personally I would like to see LLDB on
>>> Windows, but I tried it only once and gave up quickly.)
>>> * I saw there is a new MicrosoftDemangler now in LLVM. Does anyone know
>>> more about it? Especially: Are there plans to provide rich demangling
>>> information similar to the IPD?
>>>
>>> So far I started writing a unit test for Symtab::InitNameIndexes(), so I
>>> won't accidentally break its indexing. I also experimented with a
>>> potential DemangleWithRichNameIndexInfo() and had a look on the numbers
>>> of the internal LLDB timers. This was, however, not exhaustive and real
>>> benchmarking is always hard.
>>>
>>> Thanks for all kinds of feedback.
>>>
>>> Best,
>>> Stefan
>>>
>>>

-- 
https://weliveindetail.github.io/blog/
https://cryptup.org/pub/stefan.graenitz@gmail.com




More information about the lldb-dev mailing list