[lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Bob Campbell via lldb-dev lldb-dev at lists.llvm.org
Sat Mar 3 11:41:12 PST 2018


Greg, sorry I have been trying to work out my idea, and to be honest the demangler is complicated.

The core of my idea is to not fully expand the mangled name.

The key benefit in switching to the “Itanium” mangling scheme was the ability to reduce the amount
of totally redundant information.

I started with thinking about the shorter string:

_ZN1BIS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IiiES0_ES1_ES2_ES3_ES4_ES5_ES6_ES7_ES8_ES9_ESA_ESB_ESC_ESD_ESE_ESF_ESG_ESH_ESI_ESJ_ESK_ESL_E1fEv

which shows the exponential expansion problem.

This problem has been pointed out with good detail here: http://fitzgeraldnick.com/2017/02/22/cpp-demangle.html

My thinking is to show the user this redundancy in a more explict form something like:

B<$22={B<$21={B<$20={B<$19={B<$18={B<$17={B<$16={B<$15={B<$14={B<$13={B<$12={B<$11={B<$10={B<$9={B<$8={B<$7={B<$6={B<$5={B<$4={B<$3={B<$2={B<$1={B<int, int>}, $1>}, $2>}, $3>}, $4>}, $5>}, $6>}, $7>}, $8>}, $9>}, $10>}, $11>}, $12>}, $13>}, $14>}, $15>}, $16>}, $17>}, $18>}, $19>}, $20>}, $21>}, $22>::f()

Doing this by hand is time consuming and error prone, so I went looking for a demangler I could quickly modify,
anyway I don’t have access to the one I wrote at Metrowerks any more, and I ended up porting the one from LLVM
to swift (my port was a bit of a mess but I wanted more perl like string functions, and never really learned libc++
at that level).

While doing that I have seen Erik’s AST changes land, and they are really nice, it makes the demangler much
easier to understand. I switched to building an AST and would suggest a couple of improvements:

1 - introduce DEF/USE nodes into the AST tree

This allows while walking the AST tree to print indirections instead of fully expanding them.

2 - use back patching (via DEF/USE nodes) to avoid double parsing the string

In some cases “T” substitutions are referenced before their expansion is defined,
by being able to see the use node at the point the def is actually created allows
me to patch up the use with the actual expansion.

3 - modify the AST so that all nodes have an array of children (which could be empty)

This makes walking the AST simpler as we don’t have to customize each ASTNode subclass
to allow walking tree.

I did a few other things (add a StringNode so I don’t need a NodeOrStringNode), make a node which
holds an array of nodes… Tweeks in the way ParameterPackExpansion works…

Once 1 & 2 are done it is possible to experiment with heuristics for the expansion:

First is printing out all DEF nodes and USE nodes:

$23=$0=B<$T0.2=$22=$0<$T0.3=$21=$0<$T0.4=$20=$0<$T0.5=$19=$0<$T0.6=$18=$0<$T0.7=$17=$0<$T0.8=$16=$0<$T0.9=$15=$0<$T0.10=$14=$0<$T0.11=$13=$0<$T0.12=$12=$0<$T0.13=$11=$0<$T0.14=$10=$0<$T0.15=$9=$0<$T0.16=$8=$0<$T0.17=$7=$0<$T0.18=$6=$0<$T0.19=$5=$0<$T0.20=$4=$0<$T0.21=$3=$0<$T0.22=$2=$0<$T0.23=$1=$0<$T0.24=int, $T1.24=int>, $T1.23=$1>, $T1.22=$2>, $T1.21=$3>, $T1.20=$4>, $T1.19=$5>, $T1.18=$6>, $T1.17=$7>, $T1.16=$8>, $T1.15=$9>, $T1.14=$10>, $T1.13=$11>, $T1.12=$12>, $T1.11=$13>, $T1.10=$14>, $T1.9=$15>, $T1.8=$16>, $T1.7=$17>, $T1.6=$18>, $T1.5=$19>, $T1.4=$20>, $T1.3=$21>, $T1.2=$22>::f()

Clearly this is a mess but helped me a bit when debugging, and I learned:

S substitutions are global
T substitutions are scoped (so I added a generation count, currently prints are $T<id>.<gen>, perhaps I would change to $T<gen>.<id>)

First improvement is to only print a def that has at least one use:

$0={B}<$22={$0<$21={$0<$20={$0<$19={$0<$18={$0<$17={$0<$16={$0<$15={$0<$14={$0<$13={$0<$12={$0<$11={$0<$10={$0<$9={$0<$8={$0<$7={$0<$6={$0<$5={$0<$4={$0<$3={$0<$2={$0<$1={$0<int, int>}, $1>}, $2>}, $3>}, $4>}, $5>}, $6>}, $7>}, $8>}, $9>}, $10>}, $11>}, $12>}, $13>}, $14>}, $15>}, $16>}, $17>}, $18>}, $19>}, $20>}, $21>}, $22>::f()

This is better but I don’t like all of the “B”s being replaced with “$0”

Second improvement is to only print a def which has a use nested in it's definition (this is like what I showed above):

B<$22={B<$21={B<$20={B<$19={B<$18={B<$17={B<$16={B<$15={B<$14={B<$13={B<$12={B<$11={B<$10={B<$9={B<$8={B<$7={B<$6={B<$5={B<$4={B<$3={B<$2={B<$1={B<int, int>}, $1>}, $2>}, $3>}, $4>}, $5>}, $6>}, $7>}, $8>}, $9>}, $10>}, $11>}, $12>}, $13>}, $14>}, $15>}, $16>}, $17>}, $18>}, $19>}, $20>}, $21>}, $22>::f()

Third improvement is to pull the DEFs out and put them in a list <def, def…> symbol list like this:

<$1=B<int, int>, $2=B<$1, $1>, $3=B<$2, $2>, $4=B<$3, $3>, $5=B<$4, $4>, $6=B<$5, $5>, $7=B<$6, $6>, $8=B<$7, $7>, $9=B<$8, $8>, $10=B<$9, $9>, $11=B<$10, $10>, $12=B<$11, $11>, $13=B<$12, $12>, $14=B<$13, $13>, $15=B<$14, $14>, $16=B<$15, $15>, $17=B<$16, $16>, $18=B<$17, $17>, $19=B<$18, $18>, $20=B<$19, $19>, $21=B<$20, $20>, $22=B<$21, $21>> B<$22, $22>::f()

This is better as I can see the symbol is B<T, T>::f(), I find multiline easier to read in this case:

$1=B<int, int>
$2=B<$1, $1>
$3=B<$2, $2>
$4=B<$3, $3>
$5=B<$4, $4>
$6=B<$5, $5>
$7=B<$6, $6>
$8=B<$7, $7>
$9=B<$8, $8>
$10=B<$9, $9>
$11=B<$10, $10>
$12=B<$11, $11>
$13=B<$12, $12>
$14=B<$13, $13>
$15=B<$14, $14>
$16=B<$15, $15>
$17=B<$16, $16>
$18=B<$17, $17>
$19=B<$18, $18>
$20=B<$19, $19>
$21=B<$20, $20>
$22=B<$21, $21>
B<$22, $22>::f()

But this may have issues for example in stack unwinding.

This does however make really clear to the user the recursive nature of the type definition.

I can apply this to the original string, and as long as I don’t try to fully expand it, it runs fairly fast (even in unoptimized swift)

_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_

Showing the DEFs inline:

auto $10=$2=$1=shk::detail::CallbackPublisher<$T0.2=shk::Throw($4=std::__exception_ptr::exception_ptr const&)::'lambda'($8=$7=$T0.2&&)>::Subscribe<$1::ConcatMap<$1::CallbackSubscriber<$70=std::vector<$7, std::allocator<$7> > $14=shk::GetAll<int, $T1.6=$2<$61=$19 $19 shk::ConcatMap<$52=$19 shk::Concat<$T0.9=$2<$19 $19 shk::Map<$41=$19 auto $18=shk::IfEmpty<$10>($8)::$19='lambda'($7)::operator()<$2<$19 $19 $21=shk::Take<int>($8)::$22='lambda'($7)::operator()<$2<$19 $19 shk::Filter<shk::ElementAt(unsigned long)::'lambda'($8)>($8)::'lambda'($7)::operator()<$2<$19 $19 $22<unsigned long>($8)::'lambda'($7)::operator()<$2<$19 shk::From<$1::InfiniteRangeContainer<int> >($8)::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)>($8)::'lambda'($7)::operator()<$41>($7) const::'lambda'($8)>, $2<$46=$19 $19 $18<$10>($8)::$21<$41>($7) const::'lambda0'($8)>&>($49=$2<$19 $19 shk::Map<$41=$19 auto $18=shk::IfEmpty<$10>($8)::$19='lambda'($7)::operator()<$2<$19 $19 $21=shk::Take<int>($8)::$22='lambda'($7)::operator()<$2<$19 $19 shk::Filter<shk::ElementAt(unsigned long)::'lambda'($8)>($8)::'lambda'($7)::operator()<$2<$19 $19 $22<unsigned long>($8)::'lambda'($7)::operator()<$2<$19 shk::From<$1::InfiniteRangeContainer<int> >($8)::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)>($8)::'lambda'($7)::operator()<$41>($7) const::'lambda'($8)>&&, $49=$2<$46=$19 $19 $18<$10>($8)::$21<$41>($7) const::'lambda0'($8)>&&&)::'lambda'($8)>($8)::$53='lambda'($7)::operator()<$2<$19 shk::Start<$57=$19 shk::Just<$46, $49>($52)::'lambda'(), $19 $57<$46, $49>($52)::'lambda0'()>($52)::'lambda'($8)> >($7) const::'lambda'($8)> >($66=$T1.6 const&, $69=shk::ElementCount, bool)::'lambda'($8), $66 $14<int, std::vector>($69, $70, bool)::'lambda'($4&&), $66 $14<int, std::vector>($69, $70, bool)::'lambda'()>, $53, $61>::ConcatMapValuesSubscriber>($8) const

Showing the DEFs first:

$T0.2=shk::Throw($4 const&)::'lambda'($8),
$T0.9=$2<$19 $19 shk::Map<$41>($8)::'lambda'($7)::operator()<$41>($7) const::'lambda'($8)>, $2<$46>&,
$T1.6=$2<$61>,
$1=shk::detail,
$2=$1::CallbackPublisher,
$4=std::__exception_ptr::exception_ptr,
$7=$T0.2,
$8=$7&&,
$10=$2<$T0.2>,
$14=shk::GetAll,
$18=shk::IfEmpty,
$19='lambda'($7),
$21=shk::Take,
$22='lambda'($7),
$41=$19 auto $18<$10>($8)::$19::operator()<$2<$19 $19 $21<int>($8)::$22::operator()<$2<$19 $19 shk::Filter<shk::ElementAt(unsigned long)::'lambda'($8)>($8)::'lambda'($7)::operator()<$2<$19 $19 $22<unsigned long>($8)::'lambda'($7)::operator()<$2<$19 shk::From<$1::InfiniteRangeContainer<int> >($8)::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8)> >($7) const::'lambda'($8),
$46=$19 $19 $18<$10>($8)::$21<$41>($7) const::'lambda0'($8),
$46=$19 $19 $18<$10>($8)::$21<$41>($7) const::'lambda0'($8),
$49=$2<$46>&,
$52=$19 shk::Concat<$T0.9>($49&&, $49&&)::'lambda'($8),
$53='lambda'($7),
$57=$19 shk::Just<$46, $49>($52)::'lambda'(),
$61=$19 $19 shk::ConcatMap<$52>($8)::$53::operator()<$2<$19 shk::Start<$57, $19 $57<$46, $49>($52)::'lambda0'()>($52)::'lambda'($8)> >($7) const::'lambda'($8),
$66=$T1.6,
$69=shk::ElementCount,
$70=std::vector<$7, std::allocator<$7> > $14<int, $T1.6>($66 const&, $69, bool)::'lambda'($8)
auto $10::Subscribe<$1::ConcatMap<$1::CallbackSubscriber<$70, $66 $14<int, std::vector>($69, $70, bool)::'lambda'($4&&), $66 $14<int, std::vector>($69, $70, bool)::'lambda'()>, $53, $61>::ConcatMapValuesSubscriber>($8) const

At this point I don’t know that the function actually was so I don’t know if this translation is even valid

My implementation seems to have a recursive cycle in it:

$T0.2=shk::Throw($4 const&)::'lambda'($8),
$8=$7&&,
$7=$T0.2,
$T0.2=shk::Throw($4 const&)::'lambda'($8),

I have not gotten to the bottom of this to understand if that cycle is really there or if there is a bug in
how I am constructing the AST in this case.

Looking at some of the test cases there are some things I would like to see better like this one:

_ZZN4NIds4NStr14TCStrAggregateINS0_13TCTCStrTraitsINS0_11TCStrTraitsIcNS0_17CDefaultStrParamsEEENS0_14TCStrImp_FixedIS5_Lx256EEEEEE21f_AddFromIteratorUTF8INS0_16CStrIteratorUTF8EEEvRxRKT_ENKSA_ISB_EUlmE0_clEm:


void $11=$1=NIds::NStr::TCStrAggregate<$1::TCTCStrTraits<$6=$1::TCStrTraits<char, $1::CDefaultStrParams>, $1::TCStrImp_Fixed<$6, 256ll> > >::f_AddFromIteratorUTF8<$T0.6=$12=$1::CStrIteratorUTF8>(long long&, $T0.6 const&)::$11<$12>::'lambda0'(unsigned long)::operator()(unsigned long) const


$T0.6=$12
$1=NIds::NStr
$6=$1::TCStrTraits<char, $1::CDefaultStrParams>
$11=$1::TCStrAggregate<$1::TCTCStrTraits<$6, $1::TCStrImp_Fixed<$6, 256ll> > >::f_AddFromIteratorUTF8
$12=$1::CStrIteratorUTF8
void $11<$T0.6>(long long&, $T0.6 const&)::$11<$12>::'lambda0'(unsigned long)::operator()(unsigned long) const

I would prefer to expand out the NIds::NStr so it was:

$T0.6=$12
$6=NIds::NStr::TCStrTraits<char, $1::CDefaultStrParams>
$11=NIds::NStr::TCStrAggregate<$1::TCTCStrTraits<$6, NIds::NStr::TCStrImp_Fixed<$6, 256ll> > >::f_AddFromIteratorUTF8
$12=NIds::NStr::CStrIteratorUTF8
void $11<$T0.6>(long long&, $T0.6 const&)::$11<$12>::'lambda0'(unsigned long)::operator()(unsigned long) const

Anyway I think that converting to a non-recursive expansion would allow converting the problem cases from O(quadratic) to O(n)

Some thoughts:

could the debugger keep track of symbol definitions it has seen, this may be useful in stack crawls so as it demangles the symbols,
it can share a substation table with the demangler so substitutions have the same name between stack frames.

could the compiler (if it does not already) record type aliases so the demangler could translate back to the users name?

typedef vector<int> myIntBefore

This is of course programatic as the user my have multiple types all mapping to vector<int>

Ideally the output should be the same as it is now for simple names.

If anyone wants they can have my swift implementation I can send it. It was only intended as a way to play around
with the output heuristic (and to be honest spent most type trying to pass the first 180 test cases).
I wsa going to put it on github but I need to create an account, and would then lose days cleaning up the code
which was only intended as a quick prototype, and not really wanting to publish until I passed all of the tests:-)

Bob




More information about the lldb-dev mailing list