[llvm-dev] RFC: Cleaning up the Itanium demangler

Wed Jun 21 17:42:12 PDT 2017

I'm very interested in your work because I've just started writing a
demangler for the Microsoft mangling scheme. What I found in the current
Itanium demangler is the same as you -- it looks like it allocates too much
memory during parsing and concatenates std::strings too often. I could see
there's a (probably big) room to improve. Demangler's performance is
sometimes important for LLD, which is my main project, as linkers often
have to print out a lot of symbols if a verbose output is requested. For
example, if you link Chrome with the -map option, the linker has to
demangle 300 MiB strings in total, which currently takes more than 20
seconds on my machine if single-threaded.

The way I'm trying to implement a MS demangler is the same as you, too. I'm
trying to create an AST to describe type and then convert it to string. I
guess that we can use the same AST type between Itanium and MS so that we
can use the same code for converting ASTs to strings.

It's unfortunate that my work is overlapping with yours. Looks like you are
ahead of me, so I'll take a look at your code to see if there's something I
can do for you.

On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hello all,
> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is really
> slow. This is largely because the textual representation of the symbol that
> is being demangled is held in a std::string, and manipulations done during
> parsing are done on that string. The demangler is always concatenating
> strings and inserting into the middle of strings, which is terrible. The
> fact that the parsing logic and the string manipulation/formatting logic is
> interleaved also makes the demangler pretty ugly. Another problem was that
> the demangler used a lot stack space, and has a bunch of stack overflows
> filed against it.
>
> I've been working on fixing this by parsing first into an AST structure,
> and then traversing that AST to produce a demangled string. This provides a
> significant performance improvement and also make the demangler somewhat
> more clean. Attached you should find a patch to this effect. This patch is
> still very much a work in progress, but currently passes the libcxxabi test
> suite and demangles all the symbols in LLVM identically to the current
> demangler. It also provides a significant performance improvement: it
> demangles the symbols in LLVM about 3.7 times faster than the current
> demangler. Also, separating the formatting code from the parser reduces
> stack usage (the activation frame for parse_type reduced from 416 to 144
> bytes on my machine). The stack usage is still pretty bad, but this helps
> with some of it.
>
> Does anyone have any early feedback on the patch? Does this seem like a
> good direction for the demangler?
>
> As far as future plans for this file, I have a few more refactorings and
> performance improvements that I'd like to get through. After that, it might
> be interesting to try to replace the FastDemangle.cpp demangler in LLDB
> with this, to restore the one true demangler in the source tree. The
> FastDemangler.cpp is only partially completed, and calls out to
> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it
> fails to parse the symbol.
>
> Any thoughts here would be appreciated!
> Thanks,
> Erik
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170621/05b028fc/attachment.html>