[PATCH] D101444: [demangler] Initial support for the new Rust mangling scheme

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 29 14:24:45 PDT 2021


dblaikie added inline comments.


================
Comment at: llvm/include/llvm/Demangle/RustDemangle.h:8
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_DEMANGLE_RUSTDEMANGLE_H
----------------
Got a link to a spec for the mangling scheme that could go somewhere up here? (and/or in the .cpp file)


================
Comment at: llvm/include/llvm/Demangle/RustDemangle.h:30-31
+class Demangler {
+  // Maximum recursion level. Used to avoid stack overflow.
+  const size_t MaxRecursionLevel;
+  // Current recursion level.
----------------
Most of the LLVM codebase doesn't use const members (can do weird things with assignability, for instance - not that this type needs to be assignable) - so I'd be inclined to stick with that convention and not make this const either.


================
Comment at: llvm/include/llvm/Demangle/RustDemangle.h:31
+  // Maximum recursion level. Used to avoid stack overflow.
+  const size_t MaxRecursionLevel;
+  // Current recursion level.
----------------
Does the itanium demangler have a similar recursion limit handler? I didn't find one while quickly glancing around - would be good to understand any differences/tradeoffs here.


================
Comment at: llvm/include/llvm/Demangle/RustDemangle.h:35
+
+  // Input string that is being demangled (without "_R" prefix if any).
+  StringView Input;
----------------
This comment seems incorrect - the `demangle` function is passed a string that includes the "_R" prefix, and parses that before doing anything else?


================
Comment at: llvm/include/llvm/Demangle/RustDemangle.h:49
+
+  bool demangle(const char *Mangled, size_t MangledLen);
+
----------------
Looks like this part of the API isn't designed to be drop-in compatible with the itanium demangler API, yeah? (because the itanium one takes the buffer in the ctor rather than in a separate function, and takes the buffer as start/end rather than start/length)

If that's the case - any particular reason for this divergence and if we're diverging anyway, perhaps this could use a StringView parameter type rather than a decomposed start+length?


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:45
+
+  if (!D.demangle(MangledName, strlen(MangledName))) {
+    if (Status != nullptr)
----------------
If this took the argument by StringView (well, I guess, even if it didn't - could always build a StringView earlier and pass data()+size() here), then the StringView could be constructed earlier and StringView's startsWith(StringView) function could be used for the _R prefix check)


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:59-62
+      std::free(Demangled);
+      Demangled = Buf;
+    } else {
+      std::free(Buf);
----------------
Any chance of using less raw memory management here? (been some recent discussions about the complexities of memory management in the itanium demangler that, if possible, it'd be good to avoid ( https://reviews.llvm.org/D100800 ) - but if it's all wrapped up in the "wanting to conform to the same API" (ie: the API design itself is awkward, but compatibility trumps fixing that awkwardness) - such is life)


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:93
+//
+// <symbol-name> = "_R" [<decimal-number>] <path> [<instantiating-crate>]
+bool Demangler::demangle(const char *Mangled, size_t MangledLength) {
----------------
The decimal-number and instantiating-crate parts aren't implemented in this patch, I take it? Could you include a FIXME that mentions that?


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:140-141
+  case 'N': {
+    const char NS = consume();
+    if (!isLower(NS) && !isUpper(NS)) {
+      Error = true;
----------------
Is this correct/am I understanding this correctly, that some parsed components are not rendered in the demangling? Is that intentional/valid - that multiple mangled names could demangle to the same string/be ambiguous? Or is this a case of some feature not being implemented yet? 


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:166-167
+Identifier Demangler::parseIdentifier() {
+  const bool Punycode = consumeIf('u');
+  const uint64_t Bytes = parseDecimalNumber();
+  consumeIf('_');
----------------
LLVM doesn't tend to use const local variables - I'd probably skip the const here & elsewhere.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:168
+  const uint64_t Bytes = parseDecimalNumber();
+  consumeIf('_');
+
----------------
This is here in case the decimal-number and initial bytes value would be ambiguous?


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:177-182
+  for (const char C : S) {
+    if (!isValid(C)) {
+      Error = true;
+      return {};
+    }
+  }
----------------



================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:189
+// using Tag. Returns 0 when tag is absent and value + 1 otherwise.
+uint64_t Demangler::parseOptionalBase62Number(char Tag) {
+  if (!consumeIf(Tag))
----------------
Nothing's currently using (& so nothing's currently testing) the return value of this function - probably best to omit it until it's needed/tested.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:207-210
+  uint64_t Value = 0;
+  if (consumeIf('_')) {
+    return Value;
+  }
----------------
Maybe, like parseDecimalNumber below (though pick either way - so long as they're both the same), it might be nicer to move the 'Value' declaration further down and have this code be `if (consume...) { return 0; }`?

(though for both cases, if there's some nice way to avoid these special cases & have them fall out of the main loop/algorithm, that'd be even better I think - though I couldn't come up with a way off-hand)


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:208-210
+  if (consumeIf('_')) {
+    return Value;
+  }
----------------
LLVM style generally omits {} from single line blocks - remove these here and anywhere else in the patch.


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:237
+
+  if (Value > Max - 1) {
+    Error = true;
----------------
This particular check could be written maybe more clearly as "Value == Max" but I see it's a pattern for overflow testing - perhaps it could be combined in a utility function for that purposes, so the 1 here matches up with the 1 in the += below. (& used for the "+= D" & overflow check in the next function)


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:262
+
+  const uint64_t Max = std::numeric_limits<uint64_t>::max();
+  uint64_t Value = 0u;
----------------
This (& similarly in other places) could be constexpr to indicate it's a compile-time constant? (& distinct from the "don't generally use const on locals in LLVM)


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:263
+  const uint64_t Max = std::numeric_limits<uint64_t>::max();
+  uint64_t Value = 0u;
+
----------------
This one 0 value uses 'u' but most/none of the others do - maybe drop it here for consistency?


================
Comment at: llvm/lib/Demangle/RustDemangle.cpp:283-288
+void Demangler::printIdentifier(Identifier Ident) {
+  if (Error)
+    return;
+
+  print(Ident.Name);
+}
----------------
Is this function worth it compared to calling "print(Ident.Name)"? (or at least could/should this function be implemented as "print(Ident.Name)" since the error checking is already done in print(StringView) and doesn't need to be done again in this caller?)


================
Comment at: llvm/test/Demangle/rust.test:3-4
+
+CHECK: a::main
+       _RNvC1a4main
+
----------------
I'd have a bit of an easier time following the test cases if there was a comment describing the parse tree - would that be worth including? 


================
Comment at: llvm/test/Demangle/rust.test:25-31
+; Unsupported mangling version
+
+CHECK: _R0NvC1a4main
+       _R0NvC1a4main
+
+CHECK: _R1NvC1a4main
+       _R1NvC1a4main
----------------
Maybe this comment could be more descriptive - currently the implementation doesn't support any mangling version? (is the plan to support this currently, or is the version field purely forward-looking/for future use?) Maybe the comment could read "Only the null mangling version is currently supported" or something like that?

I'd probably only test 0 if that's the case - doesn't seem like 1 adds especially more coverage? But don't mind.


================
Comment at: llvm/test/Demangle/rust.test:33-34
+
+CHECK: _R199999999999999999999999999NvC1a4main
+       _R199999999999999999999999999NvC1a4main
+
----------------
This is an attempt to overflow the version number, if it were parsed at all? Not sure it's an interesting test case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101444/new/

https://reviews.llvm.org/D101444



More information about the llvm-commits mailing list