[llvm-dev] RFC: Cleaning up the Itanium demangler

Thu Jun 22 09:25:31 PDT 2017

On Wed, Jun 21, 2017 at 6:03 PM, Erik Pilkington <erik.pilkington at gmail.com>
wrote:

>
>
> On 6/21/17 5:42 PM, Rui Ueyama wrote:
>
> I'm very interested in your work because I've just started writing a
> demangler for the Microsoft mangling scheme. What I found in the current
> Itanium demangler is the same as you -- it looks like it allocates too much
> memory during parsing and concatenates std::strings too often. I could see
> there's a (probably big) room to improve. Demangler's performance is
> sometimes important for LLD, which is my main project, as linkers often
> have to print out a lot of symbols if a verbose output is requested. For
> example, if you link Chrome with the -map option, the linker has to
> demangle 300 MiB strings in total, which currently takes more than 20
> seconds on my machine if single-threaded.
>
> The way I'm trying to implement a MS demangler is the same as you, too.
> I'm trying to create an AST to describe type and then convert it to string.
> I guess that we can use the same AST type between Itanium and MS so that we
> can use the same code for converting ASTs to strings.
>
> Using the same AST is an interesting idea. The AST that I wrote isn't that
> complicated, and is pretty closely tied to the libcxxabi demangler, so I
> bet it would be easier to have separate representations, especially if your
> intending on mimicking the output of MS's demangler. I'm also not at all
> familiar with how MS mangles their C++, which might imply a slightly
> different representation.
>

I'm not going to immediately try to do it, but I think sharing the same AST
data structure seems to makes sense. I'm not too crazy about mimicking all
the details of the Microsoft's demangler, so a slight deviation is OK as
long as the difference is minor and reasonable. Mangled symbols are very
different between Itanium and Microsoft, but after all the demangled form
is a plain C++ which should be the (almost) same between the two.

> It's unfortunate that my work is overlapping with yours. Looks like you
> are ahead of me, so I'll take a look at your code to see if there's
> something I can do for you.
>
> On Wed, Jun 21, 2017 at 4:42 PM, Erik Pilkington via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hello all,
>> The itanium demangler in libcxxabi (and also, llvm/lib/Demangle) is
>> really slow. This is largely because the textual representation of the
>> symbol that is being demangled is held in a std::string, and manipulations
>> done during parsing are done on that string. The demangler is always
>> concatenating strings and inserting into the middle of strings, which is
>> terrible. The fact that the parsing logic and the string
>> manipulation/formatting logic is interleaved also makes the demangler
>> pretty ugly. Another problem was that the demangler used a lot stack space,
>> and has a bunch of stack overflows filed against it.
>>
>> I've been working on fixing this by parsing first into an AST structure,
>> and then traversing that AST to produce a demangled string. This provides a
>> significant performance improvement and also make the demangler somewhat
>> more clean. Attached you should find a patch to this effect. This patch is
>> still very much a work in progress, but currently passes the libcxxabi test
>> suite and demangles all the symbols in LLVM identically to the current
>> demangler. It also provides a significant performance improvement: it
>> demangles the symbols in LLVM about 3.7 times faster than the current
>> demangler. Also, separating the formatting code from the parser reduces
>> stack usage (the activation frame for parse_type reduced from 416 to 144
>> bytes on my machine). The stack usage is still pretty bad, but this helps
>> with some of it.
>>
>> Does anyone have any early feedback on the patch? Does this seem like a
>> good direction for the demangler?
>>
>> As far as future plans for this file, I have a few more refactorings and
>> performance improvements that I'd like to get through. After that, it might
>> be interesting to try to replace the FastDemangle.cpp demangler in LLDB
>> with this, to restore the one true demangler in the source tree. The
>> FastDemangler.cpp is only partially completed, and calls out to
>> ItaniumDemangle.cpp in llvm (which is a copy of cxa_demangle.cpp) if it
>> fails to parse the symbol.
>>
>> Any thoughts here would be appreciated!
>> Thanks,
>> Erik
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/30851e9e/attachment.html>