> lldb has its own because it has different constraints w.r.t. memory
> allocation and speed compared to the __cxa_* one. (I don't know much about
> the details there though). If falls back on the __cxa_* implementation for
> some cases where the "fast" one's implementation is incomplete (again,
> repeating what I remember... I don't know the details).

I didn't realize lldb had its own demangler.  It must not be very thorough,
because my lldb session was falling back to llvm's demangler quite a lot!

Without my change, disabling lldb's FastDemangler is ~10% slower.
With my change, disabling lldb's FastDemangler is ~1.25% slower.
(as measured by perf stat running lldb, # of cycles.  Interesting, the
instruction count difference is much larger, implying lldb's demangler has
very poor IPC).
