[PATCH] D101444: [demangler] Initial support for the new Rust mangling scheme

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 30 17:23:48 PDT 2021


dblaikie added inline comments.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:254-255
+
+    if (!mulAssign(Value, 62)) {
+      Error = true;
+      return 0;
----------------
Might be worth rolling the Error updating into mulAssign/addAssign? Remove a bit more duplication/chance of mistakes of not setting the Error flag?


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:59-62
+      std::free(Demangled);
+      Demangled = Buf;
+    } else {
+      std::free(Buf);
----------------
tmiasko wrote:
> dblaikie wrote:
> > Any chance of using less raw memory management here? (been some recent discussions about the complexities of memory management in the itanium demangler that, if possible, it'd be good to avoid ( https://reviews.llvm.org/D100800 ) - but if it's all wrapped up in the "wanting to conform to the same API" (ie: the API design itself is awkward, but compatibility trumps fixing that awkwardness) - such is life)
> The motivation was to be compatible with itanium demangler API. I feel that interface will be easier to integrate.
> 
> If you see trade-offs differently, I am open to changing it to avoid manual memory management. Just consider how a different API would fit in in places where itaniumDemangle is used right now. 
> 
> Internally, for the demangler itself, it is of little significance if demangled output is appended to an OutputStream or say an std::string.
Yeah, fair - don't think there's much to do about that, short of some unique_ptr with custom `std::free`ing deleter (& maybe some utility to do realloc with such a thing), but best not to roll that into this patch - if you wanted to do some cleanup like that, probably implementing it first in the itanium demangler & this common OutputStream abstraction, then port it to the Rust one (either before/after this code is committed).


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:140-141
+  case 'N': {
+    const char NS = consume();
+    if (!isLower(NS) && !isUpper(NS)) {
+      Error = true;
----------------
tmiasko wrote:
> dblaikie wrote:
> > Is this correct/am I understanding this correctly, that some parsed components are not rendered in the demangling? Is that intentional/valid - that multiple mangled names could demangle to the same string/be ambiguous? Or is this a case of some feature not being implemented yet? 
> Yes, some components are not shown.
> 
> In the case here we have two kinds of namespaces: implementation internal namespaces using lowercase letters, and special well-known namespaces using uppercase letters. The former are not intended to be shown. The latter currently includes closures `C` and shims `S` and are shown by the existing demanglers.
> 
> This is yet to be implemented. I added a FIXME.
I see this notion is referenced in the mangling spec document ("Like other disambiguation information, this path would usually not actually be shown by demanglers.") but this is still quite surprising to me - is there precedent for this, do any itanium or Windows manglings produce ambiguous demanglings?

If there isn't precedent for this, I think it'd be worth having a broader discussion about it (maybe a llvm-dev thread).


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:177-182
+  for (const char C : S) {
+    if (!isValid(C)) {
+      Error = true;
+      return {};
+    }
+  }
----------------
tmiasko wrote:
> dblaikie wrote:
> > 
> I would find it convenient to leave this unaddressed for the time being. Currently implementation has no build time dependency on any generated headers or libraries, and using STLExtras.h introduces both.
`std::all_of` can be used instead, if the standard library is suitable here


================
Comment at: llvm/test/Demangle/rust.test:3-4
+
+CHECK: a::main
+       _RNvC1a4main
+
----------------
tmiasko wrote:
> dblaikie wrote:
> > I'd have a bit of an easier time following the test cases if there was a comment describing the parse tree - would that be worth including? 
> I tried a few different notations, but even for short symbols they feel too verbose to be included in tests. For example:
> 
> ```
> _RNvNtCshGpAVYOtgW1_1a1b1c 
> 
> a::b::c
> 
> (symbol-name "_R"
>   (path "N" (namespace "v")
>     (path "N" (namespace "t")
>       (path "C" (identifier (disambiguator "s" (base-62-number "hGpAVYOtgW1_")) "1a"))
>       (identifier "1b"))
>     (identifier "1c")))
> ```
> 
> I also have another parser that produces output as below, if that would be helpful I could publish it:
> 
> ```
> "" <symbol-name>
> "_R" [<decimal-number>] <path> [<instantiating-crate>]
> "_R" <path> [<instantiating-crate>]
> "_R" "N" <namespace> <path> <identifier> [<instantiating-crate>]
> "_RN" <namespace> <path> <identifier> [<instantiating-crate>]
> "_RNv" <path> <identifier> [<instantiating-crate>]
> "_RNv" "N" <namespace> <path> <identifier> <identifier> [<instantiating-crate>]
> "_RNvN" <namespace> <path> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNt" <path> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNt" "C" <identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtC" <identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtC" [<disambiguator>] <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtC" ["s" <base-62-number>] <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCs" <base-62-number> <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_" <undisambiguated-identifier> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_" ["u"] <decimal-number> ["_"] <bytes> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_" <decimal-number> ["_"] <bytes> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1" ["_"] <byte> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1" <byte> <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a" <identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a" [<disambiguator>] <undisambiguated-identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a" <undisambiguated-identifier> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a" ["u"] <decimal-number> ["_"] <bytes> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a" <decimal-number> ["_"] <bytes> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1" ["_"] <byte> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1" <byte> <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b" <identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b" [<disambiguator>] <undisambiguated-identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b" <undisambiguated-identifier> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b" ["u"] <decimal-number> ["_"] <bytes> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b" <decimal-number> ["_"] <bytes> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b1" ["_"] <byte> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b1" <byte> [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b1c" [<instantiating-crate>]
> "_RNvNtCshGpAVYOtgW1_1a1b1c" [<path>]
> "_RNvNtCshGpAVYOtgW1_1a1b1c"
> ```
> 
The hand-rendered version doesn't seem too bad, really - but up to you.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101444/new/

https://reviews.llvm.org/D101444



More information about the llvm-commits mailing list