[llvm-bugs] Issue 15022 in oss-fuzz: llvm/llvm-microsoft-demangle-fuzzer: Out-of-memory in llvm_llvm-microsoft-demangle-fuzzer

tha… via monorail via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jun 5 05:58:14 PDT 2019

```Comment #2 on issue 15022 by thakis at chromium.org:
llvm/llvm-microsoft-demangle-fuzzer: Out-of-memory in
llvm_llvm-microsoft-demangle-fuzzer
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15022#c2

The root cause here ultimately is that demangling output can be exponential
in the size of the input, even for valid manglings. Consider:

\$ cat test.cc
template <class T1, class T2, class T3, class T4, class T5,
class T6, class T7, class T8, class T9, class T10>
struct Fooob {};

using A0 = Fooob<int, int, int, int, int, int, int, int, int, int>;
using A1 = Fooob<A0, A0, A0, A0, A0, A0, A0, A0, A0, A0>;
using A2 = Fooob<A1, A1, A1, A1, A1, A1, A1, A1, A1, A1>;
using A3 = Fooob<A2, A2, A2, A2, A2, A2, A2, A2, A2, A2>;
using A4 = Fooob<A3, A3, A3, A3, A3, A3, A3, A3, A3, A3>;
using A5 = Fooob<A4, A4, A4, A4, A4, A4, A4, A4, A4, A4>;
using A6 = Fooob<A5, A5, A5, A5, A5, A5, A5, A5, A5, A5>;
using A7 = Fooob<A6, A6, A6, A6, A6, A6, A6, A6, A6, A6>;

void f(A7 a) {}

\$ out/gn/bin/clang-cl /c test.cc
\$ out/gn/bin/llvm-nm test.obj
00000000
T ?f@@YAXU?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at HHHHHHHHHH@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@@Z
00000000 a @feat.00
\$ /usr/bin/time -l
out/gn/bin/llvm-undname '?f@@YAXU?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at U?\$Fooob at HHHHHHHHHH@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@U1 at U1@U1 at U1@U1 at U1@U1 at U1@U1@@@@Z'
> tmp.txt
3.93 real         0.76 user         1.15 sys
1459576832  maximum resident set size
0  average shared memory size
0  average unshared data size
0  average unshared stack size
516405  page reclaims
0  page faults
0  swaps
0  block input operations
0  block output operations
0  messages sent
943  voluntary context switches
62  involuntary context switches
\$ ls -hl tmp.txt
-rw-r--r--  1 thakis  staff   625M Jun  5 08:36 tmp.txt

This grows 10x with each additional A. I can make the backref entries lazy
and that should save one level of memory use and for this example it'd
probably cut memory use almost 10x -- but the cc file only needs to grow a
single additional line to waste that again, and the mangled string will
only grow by a few bytes for this.

This also isn't llvm-undname specific, llvm-cxxfilt has the same issue:

\$ out/gn/bin/clang -c test.cc
\$ out/gn/bin/llvm-nm test.o
0000000000000000 T
__Z1f5FooobIS_IS_IS_IS_IS_IS_IS_IiiiiiiiiiiES0_S0_S0_S0_S0_S0_S0_S0_S0_ES1_S1_S1_S1_S1_S1_S1_S1_S1_ES2_S2_S2_S2_S2_S2_S2_S2_S2_ES3_S3_S3_S3_S3_S3_S3_S3_S3_ES4_S4_S4_S4_S4_S4_S4_S4_S4_ES5_S5_S5_S5_S5_S5_S5_S5_S5_ES6_S6_S6_S6_S6_S6_S6_S6_S6_E
\$ /usr/bin/time -l out/gn/bin/llvm-cxxfilt
_Z1f5FooobIS_IS_IS_IS_IS_IS_IS_IiiiiiiiiiiES0_S0_S0_S0_S0_S0_S0_S0_S0_ES1_S1_S1_S1_S1_S1_S1_S1_S1_ES2_S2_S2_S2_S2_S2_S2_S2_S2_ES3_S3_S3_S3_S3_S3_S3_S3_S3_ES4_S4_S4_S4_S4_S4_S4_S4_S4_ES5_S5_S5_S5_S5_S5_S5_S5_S5_ES6_S6_S6_S6_S6_S6_S6_S6_S6_E
> tmp.txt
5.02 real         2.85 user         0.78 sys
1159876608  maximum resident set size
0  average shared memory size
0  average unshared data size
0  average unshared stack size
316004  page reclaims
16  page faults
0  swaps
0  block input operations
0  block output operations
0  messages sent
671  voluntary context switches
189  involuntary context switches
\$ ls -hl tmp.txt
-rw-r--r--  1 thakis  staff   552M Jun  5 08:39 tmp.txt

As said, the root cause here is that the output can be exponential in the
input size. Nobody really wants to have exponentially-sized output from
their demangler -- the following is just hard to read:

demumble '__Z1f5FooobIS_IS_IiiiiiiiiiiES0_S0_S0_S0_S0_S0_S0_S0_S0_ES1_S1_S1_S1_S1_S1_S1_S1_S1_E'
f(Fooob<Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> >, Fooob<Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int>, Fooob<int, int, int, int, int, int, int, int, int, int>,
Fooob<int, int, int, int, int, int, int, int, int, int>, Fooob<int, int,
int, int, int, int, int, int, int, int>, Fooob<int, int, int, int, int,
int, int, int, int, int>, Fooob<int, int, int, int, int, int, int, int,
int, int> > >)

So maybe we should change the demangler output for backreferences to

\$
demumble '__Z1f5FooobIS_IS_IiiiiiiiiiiES0_S0_S0_S0_S0_S0_S0_S0_S0_ES1_S1_S1_S1_S1_S1_S1_S1_S1_E'
#define S0 Foob
#define S1 S0<int, int, int, int, int,  int, int, int, int, int>
#define S2 S0<S1, S1, S1, S1, S1,  S1, S1, S1, S1, S1>
f(S2)

which is a more direct translation of what's in the mangling and arguably
easier to read. It has the advantage of needing storage that's linear in
the input.

--