[llvm] r351481 - [demangler] Ignore leading underscores if present

John McCall via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 23 23:39:36 PST 2019


On 23 Jan 2019, at 15:48, Duncan Exon Smith wrote:
> +John and Nick
>
> I'm not sure this is the right thing to do.
>
> I believe there was an intentional decision way back when (before my 
> time) to require clients to strip the leading underscore.  c++filt on 
> my desktop does this (passing `-_` by default, which means symbols 
> without the leading underscore don't get demangled).
>
> Nick and John, do you have any recollection of why the demangler was 
> restrictive like this?  Any thoughts on whether it's problematic to 
> relax it?

Okay.  If you step back and consider the whole operation of taking a 
string, figuring out whether it's a C++ mangled name, and then trying to 
demangle it, there is a plausible reason why you might not want to 
accept `__Z` as a prefix: it permits an algorithm which, when followed 
by clients on leading-underscore targets, reliably avoids demangling 
unreserved C function names like `Z3zapEv`.  If clients can present you 
with either a stripped or unstripped name, then it's ambiguous whether 
`_Z3zapEv` is a stripped reserved name (hence okay to demangle) or an 
unstripped unreserved name (hence not okay to demangle).  If clients on 
leading-underscore targets can be trusted to first find and remove the 
leading underscore, and to just not call you if there's no leading 
underscore, this problem resolves itself.  Not allowing `__Z` as a 
prefix therefore encourages clients to implement the logic correctly so 
that you can get the corner cases right.

On the other hand, you can make a strong argument that the only prefix 
which matters is the complete prefix on symbol names, so that the right 
way of thinking about it is that the C++ mangling prefix is `__Z` 
instead of `_Z` on leading-underscore targets; certainly this leads to a 
more reasonable conceptual model in the face of languages like Swift 
that eschew the underscore on all targets.  But actually following that 
logic would mean that the demangler would need to have target-specific 
behavior, and no tooling using the demangler is set up to propagate 
target information down (and what would that mean for c++filt anyway?).

In practice it's incredibly frustrating that `c++filt` doesn't allow an 
actual symbol name on the command line, although arguably this is a 
usability problem with `c++filt` rather than a flaw in the lower-level 
interface.

But none of that really matters.  This patch is a change to a function 
that's basically implementing (the first half of) `__cxa_demangle`, and 
it's already inappropriate to call `__cxa_demangle` in a generalized 
symbol-demangling use case with a non-C++ symbol name because 
`__cxa_demangle` is required to try to demangle anything that doesn't 
start with `_Z` as a type.  Since `__Z` is not and will never be the 
start of a valid type mangling, it should be harmless to change 
`__cxa_demangle` to also recognize a `__Z` prefix, and it does permit 
clients to more comfortably adopt that better conceptual model for the 
nature of the C++ prefix.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190124/a5db5537/attachment.html>


More information about the llvm-commits mailing list