[lldb-dev] LLDB Demangling

Tue Jul 24 13:55:04 PDT 2018

Hello everyone

I am relatively new to the LLDB sources and just submitted my first own
patch for review in Phabricator. Based on this patch I would like to
discuss a few details for further improvements on LLDB's demangling.

First a short recap on the current state:
* Name demangling used to be a lazy process, exclusively accessible via
Mangled::GetDemangledName() - this is a valuable mechanism:
https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Core/Mangled.cpp#L252
* My patch wants to replace the existing combination of FastDemangle &
itaniumDemangle() with LLVM's new ItaniumPartialDemangler (IPD)
implementation and no fallbacks anymore. It slightly reduces complexity
and slightly improves performance, but doesn't introduce conceptual
changes: https://reviews.llvm.org/D49612
* IPD provides rich information on names, e.g. isFuntion() or
isCtorOrDtor(), but stores that in its own state rather than returning a
queriable object:
https://github.com/llvm-mirror/llvm/blob/a3de0cbb8f4d886a968d20a8c6a6e8aa01d28c2a/include/llvm/Demangle/Demangle.h#L36
* IPD's rich info could help LLDB, where it currently parses mangled
names on its own, on-top of demangling. Symtab::InitNameIndexes() seems
to be the most prominent such place. LLDB builds an index with various
categories from all its symbols here. This is performance-critical and
it does not benefit from the laziness in GetDemangledName():
https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Symbol/Symtab.cpp#L218

My simple switch doesn't exploit IPD's rich demangling info yet and it
uses a new IPD instance for each demangling request, which is considered
quite costly as it uses a bump allocator internally. Over-all
performance still didn't drop, but even seems to benefit.

In order to fully exploit the remaining potential, I am thinking about
the following changes:

(1) In the Mangled class, add a separate entry-point for batch
demangling, that allows to pass in an existing IPD:
bool Mangled::DemangleWithRichNameIndexInfo(ItaniumPartialDemangler &IPD);

(2) DemangleWithRichNameIndexInfo() will demangle explicitly, which is
required to make sure we gather IPD's rich info. It's not lazy as
GetDemangledName(), but it will store the demangled name and set the
"MangledCounterpart" so that subsequent lazy requests will be fast.

(3) DemangleWithRichNameIndexInfo() will be used by
Symtab::InitNameIndexes(), which will have a single IPD instance that is
reused for all symbols. Symtab::InitNameIndexes() is usually called
before anything else, so it is basically "warming the cache" here.

(4) Finally, with IPD's rich info, we can get rid of the additional
string parsing in Symtab::InitNameIndexes(). I expect a considerable
speedup here too.

What do you think about the plan?
Do you think it's a good idea to add DemangleWithRichNameIndexInfo()
like this?
Are you aware of more batch-processing places like
Symtab::InitNameIndexes(), that I should consider as clients for
DemangleWithRichNameIndexInfo()?
Do you know potential side-effects I must be aware of?
Would you consider the evidence on the performance benefits convincing,
or do you think it needs bulletproof benchmarking numbers?

When it comes to MSVC-mangled names:
* It is certainly necessary to keep a legacy version of the current
categorization mechanism for these. But in general, what do you think
about their importance for LLDB? (Personally I would like to see LLDB on
Windows, but I tried it only once and gave up quickly.)
* I saw there is a new MicrosoftDemangler now in LLVM. Does anyone know
more about it? Especially: Are there plans to provide rich demangling
information similar to the IPD?

So far I started writing a unit test for Symtab::InitNameIndexes(), so I
won't accidentally break its indexing. I also experimented with a
potential DemangleWithRichNameIndexInfo() and had a look on the numbers
of the internal LLDB timers. This was, however, not exhaustive and real
benchmarking is always hard.

Thanks for all kinds of feedback.

Best,
Stefan