[PATCH] D101444: [demangler] Initial support for the new Rust mangling scheme

Tomasz Miąsko via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun May 2 08:27:24 PDT 2021


tmiasko marked 5 inline comments as done and an inline comment as not done.
tmiasko added inline comments.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:254-255
+
+    if (!mulAssign(Value, 62)) {
+      Error = true;
+      return 0;
----------------
dblaikie wrote:
> Might be worth rolling the Error updating into mulAssign/addAssign? Remove a bit more duplication/chance of mistakes of not setting the Error flag?
I moved responsibility for setting the error into mulAssign/addAssign.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:140-141
+  case 'N': {
+    const char NS = consume();
+    if (!isLower(NS) && !isUpper(NS)) {
+      Error = true;
----------------
dblaikie wrote:
> tmiasko wrote:
> > dblaikie wrote:
> > > Is this correct/am I understanding this correctly, that some parsed components are not rendered in the demangling? Is that intentional/valid - that multiple mangled names could demangle to the same string/be ambiguous? Or is this a case of some feature not being implemented yet? 
> > Yes, some components are not shown.
> > 
> > In the case here we have two kinds of namespaces: implementation internal namespaces using lowercase letters, and special well-known namespaces using uppercase letters. The former are not intended to be shown. The latter currently includes closures `C` and shims `S` and are shown by the existing demanglers.
> > 
> > This is yet to be implemented. I added a FIXME.
> I see this notion is referenced in the mangling spec document ("Like other disambiguation information, this path would usually not actually be shown by demanglers.") but this is still quite surprising to me - is there precedent for this, do any itanium or Windows manglings produce ambiguous demanglings?
> 
> If there isn't precedent for this, I think it'd be worth having a broader discussion about it (maybe a llvm-dev thread).
One case in itanium that results in the same `f()::a` demangling for both
statics, would be the following:

```
void f() {
  { static int a; }
  { static int a; }
}
```

The symbols corresponding to constructors & destructors in itanium are
ambiguous after demangling as well.


================
Comment at: llvm/test/Demangle/rust.test:3-4
+
+CHECK: a::main
+       _RNvC1a4main
+
----------------
dblaikie wrote:
> tmiasko wrote:
> > dblaikie wrote:
> > > I'd have a bit of an easier time following the test cases if there was a comment describing the parse tree - would that be worth including? 
> > I tried a few different notations, but even for short symbols they feel too verbose to be included in tests. For example:
> > 
> > ```
> > _RNvNtCshGpAVYOtgW1_1a1b1c 
> > 
> > a::b::c
> > 
> > (symbol-name "_R"
> >   (path "N" (namespace "v")
> >     (path "N" (namespace "t")
> >       (path "C" (identifier (disambiguator "s" (base-62-number "hGpAVYOtgW1_")) "1a"))
> >       (identifier "1b"))
> >     (identifier "1c")))
> > ```
> > 
> > I also have another parser that produces output as below, if that would be helpful I could publish it:
> > 
> > ```
> > "" <symbol-name>
> > "_R" [<decimal-number>] <path> [<instantiating-crate>]
> > "_R" <path> [<instantiating-crate>]
> > "_R" "N" <namespace> <path> <identifier> [<instantiating-crate>]
> > "_RN" <namespace> <path> <identifier> [<instantiating-crate>]
> > "_RNv" <path> <identifier> [<instantiating-crate>]
> > "_RNv" "N" <namespace> <path> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvN" <namespace> <path> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNt" <path> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNt" "C" <identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtC" <identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtC" [<disambiguator>] <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtC" ["s" <base-62-number>] <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCs" <base-62-number> <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_" <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_" ["u"] <decimal-number> ["_"] <bytes> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_" <decimal-number> ["_"] <bytes> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1" ["_"] <byte> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1" <byte> <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a" <identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a" [<disambiguator>] <undisambiguated-identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a" <undisambiguated-identifier> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a" ["u"] <decimal-number> ["_"] <bytes> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a" <decimal-number> ["_"] <bytes> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1" ["_"] <byte> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1" <byte> <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b" <identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b" [<disambiguator>] <undisambiguated-identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b" <undisambiguated-identifier> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b" ["u"] <decimal-number> ["_"] <bytes> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b" <decimal-number> ["_"] <bytes> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b1" ["_"] <byte> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b1" <byte> [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b1c" [<instantiating-crate>]
> > "_RNvNtCshGpAVYOtgW1_1a1b1c" [<path>]
> > "_RNvNtCshGpAVYOtgW1_1a1b1c"
> > ```
> > 
> The hand-rendered version doesn't seem too bad, really - but up to you.
I would rather not maintain parse tree for test cases.

But, see https://gist.github.com/tmiasko/7c31853276d94af57c18ca7522badd1c for a parser producing a parse tree.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101444/new/

https://reviews.llvm.org/D101444



More information about the llvm-commits mailing list