[llvm] r351481 - [demangler] Ignore leading underscores if present

Duncan P. N. Exon Smith via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 24 09:39:03 PST 2019



> On Jan 23, 2019, at 23:39, John McCall <jmccall at apple.com> wrote:
> 
> On 23 Jan 2019, at 15:48, Duncan Exon Smith wrote:
> 
> +John and Nick
> 
> I'm not sure this is the right thing to do.
> 
> I believe there was an intentional decision way back when (before my time) to require clients to strip the leading underscore. c++filt on my desktop does this (passing `-_` by default, which means symbols without the leading underscore don't get demangled).
> 
> Nick and John, do you have any recollection of why the demangler was restrictive like this? Any thoughts on whether it's problematic to relax it?
> 
> Okay. If you step back and consider the whole operation of taking a string, figuring out whether it's a C++ mangled name, and then trying to demangle it, there is a plausible reason why you might not want to accept __Z as a prefix: it permits an algorithm which, when followed by clients on leading-underscore targets, reliably avoids demangling unreserved C function names like Z3zapEv. If clients can present you with either a stripped or unstripped name, then it's ambiguous whether _Z3zapEv is a stripped reserved name (hence okay to demangle) or an unstripped unreserved name (hence not okay to demangle). If clients on leading-underscore targets can be trusted to first find and remove the leading underscore, and to just not call you if there's no leading underscore, this problem resolves itself. Not allowing __Z as a prefix therefore encourages clients to implement the logic correctly so that you can get the corner cases right.
> 
> On the other hand, you can make a strong argument that the only prefix which matters is the complete prefix on symbol names, so that the right way of thinking about it is that the C++ mangling prefix is __Z instead of _Z on leading-underscore targets; certainly this leads to a more reasonable conceptual model in the face of languages like Swift that eschew the underscore on all targets. But actually following that logic would mean that the demangler would need to have target-specific behavior, and no tooling using the demangler is set up to propagate target information down (and what would that mean for c++filt anyway?).
> 
> In practice it's incredibly frustrating that c++filt doesn't allow an actual symbol name on the command line, although arguably this is a usability problem with c++filt rather than a flaw in the lower-level interface.
> 
> But none of that really matters. This patch is a change to a function that's basically implementing (the first half of) __cxa_demangle, and it's already inappropriate to call __cxa_demangle in a generalized symbol-demangling use case with a non-C++ symbol name because __cxa_demangle is required to try to demangle anything that doesn't start with _Z as a type. Since __Z is not and will never be the start of a valid type mangling, it should be harmless to change __cxa_demangle to also recognize a __Z prefix, and it does permit clients to more comfortably adopt that better conceptual model for the nature of the C++ prefix.
> 

That seems like solid reasoning to me; thanks John.  I'm fine with this commit in that case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190124/6290c5a4/attachment-0001.html>


More information about the llvm-commits mailing list