[PATCH] D139864: [llvm-cxxfilt] Do not consider the prefix dot as part of the demangled symbol name.

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 25 01:19:51 PDT 2023


jhenderson added a comment.

In D139864#4650016 <https://reviews.llvm.org/D139864#4650016>, @DiggerLin wrote:

> In D139864#4649934 <https://reviews.llvm.org/D139864#4649934>, @jhenderson wrote:
>
>> Could you not just modify `nonMicrosoftDemangle` to handle the dot prefix? That would mean this works for all tools, including llvm-cxxfilt, rather than needing this and the separate D159539 <https://reviews.llvm.org/D159539> patch.
>
> as I mention before, If I modify in `nonMicrosoftDemangle' as
>
>   bool llvm::nonMicrosoftDemangle(const char *MangledName, std::string &Result) {
>     char *Demangled = nullptr;
>   
>     // Not consider the prefix dot as part of the demangled symbol name.
>     if (MangledName[0] == '.') {
>       ++MangledName;
>       Result = ".";
>     }
>   
>     if (isItaniumEncoding(MangledName))
>       Demangled = itaniumDemangle(MangledName, nullptr, nullptr, nullptr);
>     else if (isRustEncoding(MangledName))
>       Demangled = rustDemangle(MangledName);
>     else if (isDLangEncoding(MangledName))
>       Demangled = dlangDemangle(MangledName);
>   
>     if (!Demangled)
>       return false;
>   
>     Result += Demangled;
>     std::free(Demangled);
>     return true;
>
> The llvm-nm or llvm-cxxfilt for machO will demangle "_._Z3f.0v" as ".f.0()"
> but
> [zhijian at krypton src]$ /opt/at15.0/bin/c++filt _._Z3f.0v
> _._Z3f.0v

Honestly, I don't think I have an issue with this behaviour. `--strip-underscore` should literally just drop the leading underscore before demangling the rest of the string, and therefore the same should happen for tools where the underscore stripping behaviour is implicit. Whilst I agree there's no use-case for extra leading underscores (a Mach-O thing) and leading dots (XCOFF), the interaction of the option just seems natural to me. However, I don't really care about it, so if it's easier to do it the other way (i.e. only supporting an extra leading undescore, or a dot), I'm not going to fight about it.

Aside: the example you provided with GNU c++filt is not applicable, because you didn't specify the `--strip-underscore` option.



================
Comment at: llvm/lib/Demangle/Demangle.cpp:27
   if (starts_with(MangledName, '_') &&
-      nonMicrosoftDemangle(MangledName.substr(1), Result))
+      nonMicrosoftDemangle(MangledName.substr(1), Result, false))
     return Result;
----------------
Please mark this `false` with a `/*StripDot=*/` label


================
Comment at: llvm/lib/Demangle/Demangle.cpp:52
+
+  // Do not consider the prefix dot as part of the demangled symbol name.
+  if (StripDot && MangledName.size() > 0 && MangledName[0] == '.') {
----------------



================
Comment at: llvm/test/tools/llvm-cxxfilt/delimiters.test:36
+RUN:      '_Z3Foo$ _Z3f.0v' \
+## Show that the llvm-cxxfilt does not consider the prefix dot as part of the demangled symbol name.
+RUN:      '._Z3Foo ._Z3f.0v' \
----------------
jhenderson wrote:
> DiggerLin wrote:
> > jhenderson wrote:
> > > That being said, this isn't the right place for these new test cases. The only thing that should change in this test is related to the `._Z3Foo` string - see below.
> > > 
> > > This test is about testing which characters are treated as delimiters, i.e. characters that separate sequences of names that are to be demangled individually. It is not about testing how things are demangled in the presence of ".". You should add that testing in a new test file.
> > > 
> > > With the change in behaviour of leading dot prefixes, this test itself doesn't actually test the intended behaviour. The intention was to show that a dot was treated as part of the mangled name, rather than a separator between the previous character sequence (in this case simply the preceding space), and the subsequent mangled name (`_Z3Foo`). If the dot was treated as a delimiter, the `_Z3Foo` would have been demangled, resulting in the final part of the output being `_Z3Foo$ .Foo`. As you can see, this is indistinguishable from the case where `._Z3Foo` as a whole is demangled (to `.Foo`). Instead, this bit of the bit of the check should be modified such that the new code can't demangle it still, but if instead the dot was treated as delimiter, it could be. For example, you could have two valid name manglings separated by a single ".", e.g. `_Z3Foo._Z3Bar`. If the dot were a delimiter, that would result in an output of `Foo.Bar`, but since it isn't, the original input should be emitted to the output (i.e. `_Z3Foo._Z3Bar`).
> > > 
> > > I hope that makes sense!
> > > The intention was to show that a dot was treated as part of the mangled name
> > 
> > `._Z3Foo` is a symbol name in AIX , but the  `._Z3Foo` is not a mangle name ,only  `_Z3Foo` is a mangle name, we put the dot in front of the `_Z3Foo` to generate the entry label symbol of function in AIX.
> > 
> > 
> > 
> > > but if instead the dot was treated as delimiter, it could be. For example, you could have two valid name manglings separated by a single ".", e.g. _Z3Foo._Z3Ba
> > 
> >  according to the https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling , the dot maybe part of symbol (but in the middle of the mangled name, not the front of the mangle name).
> > 
> > So we have to do some special for the front dot of symbol name for AIX.
> > > The intention was to show that a dot was treated as part of the mangled name
> > 
> > `._Z3Foo` is a symbol name in AIX , but the  `._Z3Foo` is not a mangle name ,only  `_Z3Foo` is a mangle name, we put the dot in front of the `_Z3Foo` to generate the entry label symbol of function in AIX.
> 
> You're arguing about semantics here. As far as I'm concerned, the dot is part of the mangled name. The demangling code removes it, demangles the rest of the name, and then readds it, but it's still part of the mangled name (just not part of the Itanium mangling).
> 
> But regardless of the exact terminology, it's clear you didn't understand my comment. We both agree that the dot is part of the name (whether it's part of the mangled name or not is irrelevant in this context). What's important to this test is that it isn't a delimiter (i.e. a character that separates a string into two separate names) and the rest of my comment and explanation still stands: your change in behaviour has rendered this part of the test as no longer testing what it wanted to test and therefore the test needs updating as I already explained (and you need proper testing of the dot demangling in a different test, which I see you've added).
This test still needs updating.


================
Comment at: llvm/test/tools/llvm-cxxfilt/dot-prefix.test:2
+## Show that the llvm-cxxfilt does not consider the dot prefix to be part of the symbol name to be demangled.
+RUN:      llvm-cxxfilt -n ._ZL5func0v | FileCheck %s
+
----------------



================
Comment at: llvm/tools/llvm-cxxfilt/llvm-cxxfilt.cpp:70
   std::string_view DecoratedStr = Mangled;
+  bool StripDot = true;
   if (StripUnderscore)
----------------
As we're not actually stripping the dot (it's restored at the end), I recommend naming this `CanHaveLeadingDot` or something to that effect. Same goes in the `nonMicrosoftDemangle` function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139864/new/

https://reviews.llvm.org/D139864



More information about the llvm-commits mailing list