[libcxxabi] [llvm] [WIP] [libcxxabi][ItaniumDemangle] Add infrastructure to track location information of parts of a demangled function name (PR #133249)

Michael Buch via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 27 06:21:53 PDT 2025


https://github.com/Michael137 created https://github.com/llvm/llvm-project/pull/133249

This patch includes the necessary changes for the LLDB feature proposed in https://discourse.llvm.org/t/rfc-lldb-highlighting-function-names-in-lldb-backtraces/85309. The TL;DR is that we want to track where certain parts of a demangled name begin/end so we can highlight them in backtraces.

The idea that a function name can be decomposed into <scope, base, arguments>. The assumption is that given the ranges of those three elements and the demangled name, LLDB will be able to to reconstruct the full demangled name. The tracking of those ranges is pretty simple inside the demangler. We don’t ever deal with nesting, so whenever we recurse into a template argument list or another function type, we just stop tracking any positions. Once we recursed out of those, and are back to printing the top-level function name, we continue tracking the positions.

The current implementation introduces a new structure `FunctionNameInfo` that holds all this information and is stored in the `llvm::itanium_demangle::OutputBuffer` class, which is unfortunately the only way to keep state while printing the demangle tree (it already contains other kinds of information similar to this tracking. In [[RFC][ItaniumDemangler] New option to print compact C++ names](https://discourse.llvm.org/t/rfc-itaniumdemangler-new-option-to-print-compact-c-names/82819) we propose to refactor this, but shouldn’t be a blocker unless people feel otherwise).

I added the tracking implementation to a new `Utility.cpp`, so I had to update the sync script. Currently the `libcxxabi` fails to link, because I haven't figured out how to build/link this new object file. If someone has any ideas, that'd be appreciated. Or if we prefer to keep this header-only, happy to do that too.

Tests are in `ItaniumDemangleTest.cpp`.

>From 0875195a7ed39c21e9b639bf66d56b48e9869e51 Mon Sep 17 00:00:00 2001
From: Michael Buch <michaelbuch12 at gmail.com>
Date: Tue, 11 Mar 2025 08:57:13 +0000
Subject: [PATCH] [llvm][ItaniumDemangle] Add function name location tracking

---
 libcxxabi/src/demangle/ItaniumDemangle.h      |  21 ++++
 libcxxabi/src/demangle/Utility.cpp            | 112 ++++++++++++++++++
 libcxxabi/src/demangle/Utility.h              |  91 +++++++++++---
 libcxxabi/src/demangle/cp-to-llvm.sh          |  62 +++++++---
 llvm/include/llvm/Demangle/ItaniumDemangle.h  |  21 ++++
 llvm/include/llvm/Demangle/Utility.h          |  91 +++++++++++---
 llvm/lib/Demangle/CMakeLists.txt              |   1 +
 llvm/lib/Demangle/README.txt                  |  61 ++++++++++
 llvm/lib/Demangle/Utility.cpp                 | 112 ++++++++++++++++++
 .../Demangle/ItaniumDemangleTest.cpp          | 112 ++++++++++++++++++
 10 files changed, 635 insertions(+), 49 deletions(-)
 create mode 100644 libcxxabi/src/demangle/Utility.cpp
 create mode 100644 llvm/lib/Demangle/README.txt
 create mode 100644 llvm/lib/Demangle/Utility.cpp

diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h
index 3df41b5f4d7d0..b5a0a86b119f4 100644
--- a/libcxxabi/src/demangle/ItaniumDemangle.h
+++ b/libcxxabi/src/demangle/ItaniumDemangle.h
@@ -851,11 +851,13 @@ class FunctionType final : public Node {
   // by printing out the return types's left, then print our parameters, then
   // finally print right of the return type.
   void printLeft(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     Ret->printLeft(OB);
     OB += " ";
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
@@ -971,18 +973,32 @@ class FunctionEncoding final : public Node {
   const Node *getName() const { return Name; }
 
   void printLeft(OutputBuffer &OB) const override {
+    // Nested FunctionEncoding parsing can happen with following productions:
+    // * <local-name>
+    // * <expr-primary>
+    auto Scoped = OB.enterFunctionTypePrinting();
+
     if (Ret) {
       Ret->printLeft(OB);
       if (!Ret->hasRHSComponent(OB))
         OB += " ";
     }
+
+    OB.FunctionInfo.updateScopeStart(OB);
+
     Name->print(OB);
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
+    OB.FunctionInfo.finalizeStart(OB);
+
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
+
+    OB.FunctionInfo.finalizeArgumentEnd(OB);
+
     if (Ret)
       Ret->printRight(OB);
 
@@ -1005,6 +1021,8 @@ class FunctionEncoding final : public Node {
       OB += " requires ";
       Requires->print(OB);
     }
+
+    OB.FunctionInfo.finalizeEnd(OB);
   }
 };
 
@@ -1072,7 +1090,9 @@ struct NestedName : Node {
   void printLeft(OutputBuffer &OB) const override {
     Qual->print(OB);
     OB += "::";
+    OB.FunctionInfo.updateScopeEnd(OB);
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
   }
 };
 
@@ -1633,6 +1653,7 @@ struct NameWithTemplateArgs : Node {
 
   void printLeft(OutputBuffer &OB) const override {
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
     TemplateArgs->print(OB);
   }
 };
diff --git a/libcxxabi/src/demangle/Utility.cpp b/libcxxabi/src/demangle/Utility.cpp
new file mode 100644
index 0000000000000..04516082b3443
--- /dev/null
+++ b/libcxxabi/src/demangle/Utility.cpp
@@ -0,0 +1,112 @@
+//===--- Utility.cpp ------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Provide some utility classes for use in the demangler.
+// There are two copies of this file in the source tree.  The one in libcxxabi
+// is the original and the one in llvm is the copy.  Use cp-to-llvm.sh to update
+// the copy.  See README.txt for more details.
+//
+//===----------------------------------------------------------------------===//
+
+#include "Utility.h"
+#include "DemangleConfig.h"
+
+DEMANGLE_NAMESPACE_BEGIN
+
+bool FunctionNameInfo::startedPrintingArguments() const {
+  return ArgumentLocs.first > 0;
+}
+
+bool FunctionNameInfo::shouldTrack(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+bool FunctionNameInfo::canFinalize(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (!startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+void FunctionNameInfo::updateBasenameEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  BasenameLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.first = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeArgumentEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.first = OB.getCurrentPosition();
+
+  // If nothing has set the end of the basename yet (for example when
+  // printing templates), then the beginning of the arguments is the end of
+  // the basename.
+  if (BasenameLocs.second == 0)
+    OB.FunctionInfo.BasenameLocs.second = OB.getCurrentPosition();
+
+  DEMANGLE_ASSERT(!shouldTrack(OB), "");
+  DEMANGLE_ASSERT(canFinalize(OB), "");
+}
+
+void FunctionNameInfo::finalizeEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  if (ScopeLocs.first > OB.FunctionInfo.ScopeLocs.second)
+    ScopeLocs.second = OB.FunctionInfo.ScopeLocs.first;
+  BasenameLocs.first = OB.FunctionInfo.ScopeLocs.second;
+}
+
+bool FunctionNameInfo::hasBasename() const {
+  return BasenameLocs.first != BasenameLocs.second && BasenameLocs.second > 0;
+}
+
+ScopedOverride<unsigned> OutputBuffer::enterFunctionTypePrinting() {
+  return {FunctionPrintingDepth, FunctionPrintingDepth + 1};
+}
+
+DEMANGLE_NAMESPACE_END
diff --git a/libcxxabi/src/demangle/Utility.h b/libcxxabi/src/demangle/Utility.h
index f1fad35d60d98..3b9ff8ea1f82b 100644
--- a/libcxxabi/src/demangle/Utility.h
+++ b/libcxxabi/src/demangle/Utility.h
@@ -27,6 +27,66 @@
 
 DEMANGLE_NAMESPACE_BEGIN
 
+template <class T> class ScopedOverride {
+  T &Loc;
+  T Original;
+
+public:
+  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
+
+  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
+    Loc_ = std::move(NewVal);
+  }
+  ~ScopedOverride() { Loc = std::move(Original); }
+
+  ScopedOverride(const ScopedOverride &) = delete;
+  ScopedOverride &operator=(const ScopedOverride &) = delete;
+};
+
+class OutputBuffer;
+
+// Stores information about parts of a demangled function name.
+struct FunctionNameInfo {
+  /// A [start, end) pair for the function basename.
+  /// The basename is the name without scope qualifiers
+  /// and without template parameters. E.g.,
+  /// \code{.cpp}
+  ///    void foo::bar<int>::someFunc<float>(int) const &&
+  ///                        ^       ^
+  ///                      Start    End
+  /// \endcode
+  std::pair<size_t, size_t> BasenameLocs;
+
+  /// A [start, end) pair for the function scope qualifiers.
+  /// E.g., for
+  /// \code{.cpp}
+  ///    void foo::bar<int>::qux<float>(int) const &&
+  ///         ^              ^
+  ///       Start           End
+  /// \endcode
+  std::pair<size_t, size_t> ScopeLocs;
+
+  /// Indicates the [start, end) of the function argument lits.
+  /// E.g.,
+  /// \code{.cpp}
+  ///    int (*getFunc<float>(float, double))(int, int)
+  ///                        ^              ^
+  ///                      start           end
+  /// \endcode
+  std::pair<size_t, size_t> ArgumentLocs;
+
+  bool startedPrintingArguments() const;
+  bool shouldTrack(OutputBuffer &OB) const;
+  bool canFinalize(OutputBuffer &OB) const;
+  void updateBasenameEnd(OutputBuffer &OB);
+  void updateScopeStart(OutputBuffer &OB);
+  void updateScopeEnd(OutputBuffer &OB);
+  void finalizeArgumentEnd(OutputBuffer &OB);
+  void finalizeStart(OutputBuffer &OB);
+  void finalizeEnd(OutputBuffer &OB);
+  bool hasBasename() const;
+};
+
 // Stream that AST nodes write their string representation into after the AST
 // has been parsed.
 class OutputBuffer {
@@ -34,6 +94,10 @@ class OutputBuffer {
   size_t CurrentPosition = 0;
   size_t BufferCapacity = 0;
 
+  /// When a function type is being printed this value is incremented.
+  /// When printing of the type is finished the value is decremented.
+  unsigned FunctionPrintingDepth = 0;
+
   // Ensure there are at least N more positions in the buffer.
   void grow(size_t N) {
     size_t Need = N + CurrentPosition;
@@ -92,8 +156,19 @@ class OutputBuffer {
   /// Use a counter so we can simply increment inside parentheses.
   unsigned GtIsGt = 1;
 
+  /// When printing the mangle tree, this object will hold information about
+  /// the function name being printed (if any).
+  FunctionNameInfo FunctionInfo;
+
+  /// Called when we start printing a function type.
+  [[nodiscard]] ScopedOverride<unsigned> enterFunctionTypePrinting();
+
   bool isGtInsideTemplateArgs() const { return GtIsGt == 0; }
 
+  bool isPrintingTopLevelFunctionType() const {
+    return FunctionPrintingDepth == 1;
+  }
+
   void printOpen(char Open = '(') {
     GtIsGt++;
     *this += Open;
@@ -182,22 +257,6 @@ class OutputBuffer {
   size_t getBufferCapacity() const { return BufferCapacity; }
 };
 
-template <class T> class ScopedOverride {
-  T &Loc;
-  T Original;
-
-public:
-  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
-
-  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
-    Loc_ = std::move(NewVal);
-  }
-  ~ScopedOverride() { Loc = std::move(Original); }
-
-  ScopedOverride(const ScopedOverride &) = delete;
-  ScopedOverride &operator=(const ScopedOverride &) = delete;
-};
-
 DEMANGLE_NAMESPACE_END
 
 #endif
diff --git a/libcxxabi/src/demangle/cp-to-llvm.sh b/libcxxabi/src/demangle/cp-to-llvm.sh
index f8b3585a5fa37..4d76a1e110687 100755
--- a/libcxxabi/src/demangle/cp-to-llvm.sh
+++ b/libcxxabi/src/demangle/cp-to-llvm.sh
@@ -7,30 +7,58 @@ set -e
 
 cd $(dirname $0)
 HDRS="ItaniumDemangle.h ItaniumNodes.def StringViewExtras.h Utility.h"
-LLVM_DEMANGLE_DIR=$1
+SRCS="Utility.cpp"
+LLVM_DEMANGLE_INCLUDE_DIR=$1
+LLVM_DEMANGLE_SOURCE_DIR=$2
 
-if [[ -z "$LLVM_DEMANGLE_DIR" ]]; then
-    LLVM_DEMANGLE_DIR="../../../llvm/include/llvm/Demangle"
+if [[ -z "$LLVM_DEMANGLE_INCLUDE_DIR" ]]; then
+    LLVM_DEMANGLE_INCLUDE_DIR="../../../llvm/include/llvm/Demangle"
 fi
 
-if [[ ! -d "$LLVM_DEMANGLE_DIR" ]]; then
-    echo "No such directory: $LLVM_DEMANGLE_DIR" >&2
+if [[ -z "$LLVM_DEMANGLE_SOURCE_DIR" ]]; then
+    LLVM_DEMANGLE_SOURCE_DIR="../../../llvm/lib/Demangle"
+fi
+
+if [[ ! -d "$LLVM_DEMANGLE_INCLUDE_DIR" ]]; then
+    echo "No such directory: $LLVM_DEMANGLE_INCLUDE_DIR" >&2
+    exit 1
+fi
+
+if [[ ! -d "$LLVM_DEMANGLE_SOURCE_DIR" ]]; then
+    echo "No such directory: $LLVM_DEMANGLE_SOURCE_DIR" >&2
     exit 1
 fi
 
-read -p "This will overwrite the copies of $HDRS in $LLVM_DEMANGLE_DIR; are you sure? [y/N]" -n 1 -r ANSWER
+read -p "This will overwrite the copies of $HDRS in $LLVM_DEMANGLE_INCLUDE_DIR and $SRCS in $LLVM_DEMANGLE_SOURCE_DIR; are you sure? [y/N]" -n 1 -r ANSWER
 echo
 
-if [[ $ANSWER =~ ^[Yy]$ ]]; then
-    cp -f README.txt $LLVM_DEMANGLE_DIR
-    chmod -w $LLVM_DEMANGLE_DIR/README.txt
-    for I in $HDRS ; do
-	rm -f $LLVM_DEMANGLE_DIR/$I
-	dash=$(echo "$I---------------------------" | cut -c -27 |\
-		   sed 's|[^-]*||')
-	sed -e '1s|^//=*-* .*\..* -*.*=*// *$|//===--- '"$I $dash"'-*- mode:c++;eval:(read-only-mode) -*-===//|' \
-	    -e '2s|^// *$|//       Do not edit! See README.txt.|' \
-	    $I >$LLVM_DEMANGLE_DIR/$I
-	chmod -w $LLVM_DEMANGLE_DIR/$I
+function copy_files() {
+    local dest_dir=$1
+    local files=$2
+    local adjust_include_paths=$3
+
+    cp -f README.txt $dest_dir
+    chmod -w $dest_dir/README.txt
+    for I in $files ; do
+    rm -f $dest_dir/$I
+    dash=$(echo "$I---------------------------" | cut -c -27 |\
+    	   sed 's|[^-]*||')
+    sed -e '1s|^//=*-* .*\..* -*.*=*// *$|//===--- '"$I $dash"'-*- mode:c++;eval:(read-only-mode) -*-===//|' \
+        -e '2s|^// *$|//       Do not edit! See README.txt.|' \
+        $I >$dest_dir/$I
+
+    if [[ "$adjust_include_paths" = true ]]; then
+        sed -i '' \
+            -e 's|#include "DemangleConfig.h"|#include "llvm/Demangle/DemangleConfig.h"|' \
+            -e 's|#include "Utility.h"|#include "llvm/Demangle/Utility.h"|' \
+            $dest_dir/$I
+    fi
+
+    chmod -w $dest_dir/$I
     done
+}
+
+if [[ $ANSWER =~ ^[Yy]$ ]]; then
+  copy_files $LLVM_DEMANGLE_INCLUDE_DIR "$HDRS" false
+  copy_files $LLVM_DEMANGLE_SOURCE_DIR "$SRCS" true
 fi
diff --git a/llvm/include/llvm/Demangle/ItaniumDemangle.h b/llvm/include/llvm/Demangle/ItaniumDemangle.h
index b0363c1a7a786..2b51be306203d 100644
--- a/llvm/include/llvm/Demangle/ItaniumDemangle.h
+++ b/llvm/include/llvm/Demangle/ItaniumDemangle.h
@@ -851,11 +851,13 @@ class FunctionType final : public Node {
   // by printing out the return types's left, then print our parameters, then
   // finally print right of the return type.
   void printLeft(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     Ret->printLeft(OB);
     OB += " ";
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
@@ -971,18 +973,32 @@ class FunctionEncoding final : public Node {
   const Node *getName() const { return Name; }
 
   void printLeft(OutputBuffer &OB) const override {
+    // Nested FunctionEncoding parsing can happen with following productions:
+    // * <local-name>
+    // * <expr-primary>
+    auto Scoped = OB.enterFunctionTypePrinting();
+
     if (Ret) {
       Ret->printLeft(OB);
       if (!Ret->hasRHSComponent(OB))
         OB += " ";
     }
+
+    OB.FunctionInfo.updateScopeStart(OB);
+
     Name->print(OB);
   }
 
   void printRight(OutputBuffer &OB) const override {
+    auto Scoped = OB.enterFunctionTypePrinting();
+    OB.FunctionInfo.finalizeStart(OB);
+
     OB.printOpen();
     Params.printWithComma(OB);
     OB.printClose();
+
+    OB.FunctionInfo.finalizeArgumentEnd(OB);
+
     if (Ret)
       Ret->printRight(OB);
 
@@ -1005,6 +1021,8 @@ class FunctionEncoding final : public Node {
       OB += " requires ";
       Requires->print(OB);
     }
+
+    OB.FunctionInfo.finalizeEnd(OB);
   }
 };
 
@@ -1072,7 +1090,9 @@ struct NestedName : Node {
   void printLeft(OutputBuffer &OB) const override {
     Qual->print(OB);
     OB += "::";
+    OB.FunctionInfo.updateScopeEnd(OB);
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
   }
 };
 
@@ -1633,6 +1653,7 @@ struct NameWithTemplateArgs : Node {
 
   void printLeft(OutputBuffer &OB) const override {
     Name->print(OB);
+    OB.FunctionInfo.updateBasenameEnd(OB);
     TemplateArgs->print(OB);
   }
 };
diff --git a/llvm/include/llvm/Demangle/Utility.h b/llvm/include/llvm/Demangle/Utility.h
index e893cceea2cdc..4e69c3623b480 100644
--- a/llvm/include/llvm/Demangle/Utility.h
+++ b/llvm/include/llvm/Demangle/Utility.h
@@ -27,6 +27,66 @@
 
 DEMANGLE_NAMESPACE_BEGIN
 
+template <class T> class ScopedOverride {
+  T &Loc;
+  T Original;
+
+public:
+  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
+
+  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
+    Loc_ = std::move(NewVal);
+  }
+  ~ScopedOverride() { Loc = std::move(Original); }
+
+  ScopedOverride(const ScopedOverride &) = delete;
+  ScopedOverride &operator=(const ScopedOverride &) = delete;
+};
+
+class OutputBuffer;
+
+// Stores information about parts of a demangled function name.
+struct FunctionNameInfo {
+  /// A [start, end) pair for the function basename.
+  /// The basename is the name without scope qualifiers
+  /// and without template parameters. E.g.,
+  /// \code{.cpp}
+  ///    void foo::bar<int>::someFunc<float>(int) const &&
+  ///                        ^       ^
+  ///                      Start    End
+  /// \endcode
+  std::pair<size_t, size_t> BasenameLocs;
+
+  /// A [start, end) pair for the function scope qualifiers.
+  /// E.g., for
+  /// \code{.cpp}
+  ///    void foo::bar<int>::qux<float>(int) const &&
+  ///         ^              ^
+  ///       Start           End
+  /// \endcode
+  std::pair<size_t, size_t> ScopeLocs;
+
+  /// Indicates the [start, end) of the function argument lits.
+  /// E.g.,
+  /// \code{.cpp}
+  ///    int (*getFunc<float>(float, double))(int, int)
+  ///                        ^              ^
+  ///                      start           end
+  /// \endcode
+  std::pair<size_t, size_t> ArgumentLocs;
+
+  bool startedPrintingArguments() const;
+  bool shouldTrack(OutputBuffer &OB) const;
+  bool canFinalize(OutputBuffer &OB) const;
+  void updateBasenameEnd(OutputBuffer &OB);
+  void updateScopeStart(OutputBuffer &OB);
+  void updateScopeEnd(OutputBuffer &OB);
+  void finalizeArgumentEnd(OutputBuffer &OB);
+  void finalizeStart(OutputBuffer &OB);
+  void finalizeEnd(OutputBuffer &OB);
+  bool hasBasename() const;
+};
+
 // Stream that AST nodes write their string representation into after the AST
 // has been parsed.
 class OutputBuffer {
@@ -34,6 +94,10 @@ class OutputBuffer {
   size_t CurrentPosition = 0;
   size_t BufferCapacity = 0;
 
+  /// When a function type is being printed this value is incremented.
+  /// When printing of the type is finished the value is decremented.
+  unsigned FunctionPrintingDepth = 0;
+
   // Ensure there are at least N more positions in the buffer.
   void grow(size_t N) {
     size_t Need = N + CurrentPosition;
@@ -92,8 +156,19 @@ class OutputBuffer {
   /// Use a counter so we can simply increment inside parentheses.
   unsigned GtIsGt = 1;
 
+  /// When printing the mangle tree, this object will hold information about
+  /// the function name being printed (if any).
+  FunctionNameInfo FunctionInfo;
+
+  /// Called when we start printing a function type.
+  [[nodiscard]] ScopedOverride<unsigned> enterFunctionTypePrinting();
+
   bool isGtInsideTemplateArgs() const { return GtIsGt == 0; }
 
+  bool isPrintingTopLevelFunctionType() const {
+    return FunctionPrintingDepth == 1;
+  }
+
   void printOpen(char Open = '(') {
     GtIsGt++;
     *this += Open;
@@ -182,22 +257,6 @@ class OutputBuffer {
   size_t getBufferCapacity() const { return BufferCapacity; }
 };
 
-template <class T> class ScopedOverride {
-  T &Loc;
-  T Original;
-
-public:
-  ScopedOverride(T &Loc_) : ScopedOverride(Loc_, Loc_) {}
-
-  ScopedOverride(T &Loc_, T NewVal) : Loc(Loc_), Original(Loc_) {
-    Loc_ = std::move(NewVal);
-  }
-  ~ScopedOverride() { Loc = std::move(Original); }
-
-  ScopedOverride(const ScopedOverride &) = delete;
-  ScopedOverride &operator=(const ScopedOverride &) = delete;
-};
-
 DEMANGLE_NAMESPACE_END
 
 #endif
diff --git a/llvm/lib/Demangle/CMakeLists.txt b/llvm/lib/Demangle/CMakeLists.txt
index eb7d212a02449..0da6f6b89ad54 100644
--- a/llvm/lib/Demangle/CMakeLists.txt
+++ b/llvm/lib/Demangle/CMakeLists.txt
@@ -1,4 +1,5 @@
 add_llvm_component_library(LLVMDemangle
+  Utility.cpp
   Demangle.cpp
   ItaniumDemangle.cpp
   MicrosoftDemangle.cpp
diff --git a/llvm/lib/Demangle/README.txt b/llvm/lib/Demangle/README.txt
new file mode 100644
index 0000000000000..c3f49e57b8d16
--- /dev/null
+++ b/llvm/lib/Demangle/README.txt
@@ -0,0 +1,61 @@
+Itanium Name Demangler Library
+==============================
+
+Introduction
+------------
+
+This directory contains the generic itanium name demangler
+library. The main purpose of the library is to demangle C++ symbols,
+i.e. convert the string "_Z1fv" into "f()". You can also use the CRTP
+base ManglingParser to perform some simple analysis on the mangled
+name, or (in LLVM) use the opaque ItaniumPartialDemangler to query the
+demangled AST.
+
+Why are there multiple copies of the this library in the source tree?
+---------------------------------------------------------------------
+
+The canonical sources are in libcxxabi/src/demangle and some of the
+files are copied to llvm/include/llvm/Demangle.  The simple reason for
+this comes from before the monorepo, and both [sub]projects need to
+demangle symbols, but neither can depend on each other.
+
+* libcxxabi needs the demangler to implement __cxa_demangle, which is
+  part of the itanium ABI spec.
+
+* LLVM needs a copy for a bunch of places, and cannot rely on the
+  system's __cxa_demangle because it a) might not be available (i.e.,
+  on Windows), and b) may not be up-to-date on the latest language
+  features.
+
+The copy of the demangler in LLVM has some extra stuff that aren't
+needed in libcxxabi (ie, the MSVC demangler, ItaniumPartialDemangler),
+which depend on the shared generic components. Despite these
+differences, we want to keep the "core" generic demangling library
+identical between both copies to simplify development and testing.
+
+If you're working on the generic library, then do the work first in
+libcxxabi, then run libcxxabi/src/demangle/cp-to-llvm.sh. This
+script takes as an optional argument the path to llvm, and copies the
+changes you made to libcxxabi over.  Note that this script just
+blindly overwrites all changes to the generic library in llvm, so be
+careful.
+
+Because the core demangler needs to work in libcxxabi, everything
+needs to be declared in an anonymous namespace (see
+DEMANGLE_NAMESPACE_BEGIN), and you can't introduce any code that
+depends on the libcxx dylib.
+
+FIXME: Now that LLVM is a monorepo, it should be possible to
+de-duplicate this code, and have both LLVM and libcxxabi depend on a
+shared demangler library.
+
+Testing
+-------
+
+The tests are split up between libcxxabi/test/{unit,}test_demangle.cpp, and
+llvm/unittests/Demangle. The llvm directory should only get tests for stuff not
+included in the core library. In the future though, we should probably move all
+the tests to LLVM.
+
+It is also a really good idea to run libFuzzer after non-trivial changes, see
+libcxxabi/fuzz/cxa_demangle_fuzzer.cpp and https://llvm.org/docs/LibFuzzer.html.
diff --git a/llvm/lib/Demangle/Utility.cpp b/llvm/lib/Demangle/Utility.cpp
new file mode 100644
index 0000000000000..1eab251581c9e
--- /dev/null
+++ b/llvm/lib/Demangle/Utility.cpp
@@ -0,0 +1,112 @@
+//===--- Utility.cpp -----------------*- mode:c++;eval:(read-only-mode) -*-===//
+//       Do not edit! See README.txt.
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Provide some utility classes for use in the demangler.
+// There are two copies of this file in the source tree.  The one in libcxxabi
+// is the original and the one in llvm is the copy.  Use cp-to-llvm.sh to update
+// the copy.  See README.txt for more details.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Demangle/Utility.h"
+#include "llvm/Demangle/DemangleConfig.h"
+
+DEMANGLE_NAMESPACE_BEGIN
+
+bool FunctionNameInfo::startedPrintingArguments() const {
+  return ArgumentLocs.first > 0;
+}
+
+bool FunctionNameInfo::shouldTrack(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+bool FunctionNameInfo::canFinalize(OutputBuffer &OB) const {
+  if (!OB.isPrintingTopLevelFunctionType())
+    return false;
+
+  if (OB.isGtInsideTemplateArgs())
+    return false;
+
+  if (!startedPrintingArguments())
+    return false;
+
+  return true;
+}
+
+void FunctionNameInfo::updateBasenameEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  BasenameLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.first = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::updateScopeEnd(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  ScopeLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeArgumentEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.second = OB.getCurrentPosition();
+}
+
+void FunctionNameInfo::finalizeStart(OutputBuffer &OB) {
+  if (!shouldTrack(OB))
+    return;
+
+  OB.FunctionInfo.ArgumentLocs.first = OB.getCurrentPosition();
+
+  // If nothing has set the end of the basename yet (for example when
+  // printing templates), then the beginning of the arguments is the end of
+  // the basename.
+  if (BasenameLocs.second == 0)
+    OB.FunctionInfo.BasenameLocs.second = OB.getCurrentPosition();
+
+  DEMANGLE_ASSERT(!shouldTrack(OB), "");
+  DEMANGLE_ASSERT(canFinalize(OB), "");
+}
+
+void FunctionNameInfo::finalizeEnd(OutputBuffer &OB) {
+  if (!canFinalize(OB))
+    return;
+
+  if (ScopeLocs.first > OB.FunctionInfo.ScopeLocs.second)
+    ScopeLocs.second = OB.FunctionInfo.ScopeLocs.first;
+  BasenameLocs.first = OB.FunctionInfo.ScopeLocs.second;
+}
+
+bool FunctionNameInfo::hasBasename() const {
+  return BasenameLocs.first != BasenameLocs.second && BasenameLocs.second > 0;
+}
+
+ScopedOverride<unsigned> OutputBuffer::enterFunctionTypePrinting() {
+  return {FunctionPrintingDepth, FunctionPrintingDepth + 1};
+}
+
+DEMANGLE_NAMESPACE_END
diff --git a/llvm/unittests/Demangle/ItaniumDemangleTest.cpp b/llvm/unittests/Demangle/ItaniumDemangleTest.cpp
index bc6ccc2e16e65..8e88f52dbc9b4 100644
--- a/llvm/unittests/Demangle/ItaniumDemangleTest.cpp
+++ b/llvm/unittests/Demangle/ItaniumDemangleTest.cpp
@@ -114,3 +114,115 @@ TEST(ItaniumDemangle, HalfType) {
   ASSERT_NE(nullptr, Parser.parse());
   EXPECT_THAT(Parser.Types, testing::ElementsAre("_Float16", "A", "_Float16"));
 }
+
+struct DemanglingPartsTestCase {
+  const char *mangled;
+  itanium_demangle::FunctionNameInfo expected_info;
+  llvm::StringRef basename;
+  llvm::StringRef scope;
+  bool valid_basename = true;
+};
+
+DemanglingPartsTestCase g_demangling_parts_test_cases[] = {
+    // clang-format off
+  { "_ZNVKO3BarIN2ns3QuxIiEEE1CIPFi3FooIS_IiES6_EEE6methodIS6_EENS5_IT_SC_E5InnerIiEESD_SD_",
+    { .BasenameLocs = {92, 98}, .ScopeLocs = {36, 92}, .ArgumentLocs = { 108, 158 } },
+    .basename = "method",
+    .scope = "Bar<ns::Qux<int>>::C<int (*)(Foo<Bar<int>, Bar<int>>)>::"
+  },
+  { "_Z7getFuncIfEPFiiiET_",
+    { .BasenameLocs = {6, 13}, .ScopeLocs = {6, 6}, .ArgumentLocs = { 20, 27 } },
+    .basename = "getFunc",
+    .scope = ""
+  },
+  { "_ZN1f1b1c1gEv",
+    { .BasenameLocs = {9, 10}, .ScopeLocs = {0, 9}, .ArgumentLocs = { 10, 12 } },
+    .basename = "g",
+    .scope = "f::b::c::"
+  },
+  { "_ZN5test73fD1IiEEDTcmtlNS_1DEL_ZNS_1bEEEcvT__EES2_",
+    { .BasenameLocs = {45, 48}, .ScopeLocs = {38, 45}, .ArgumentLocs = { 53, 58 } },
+    .basename = "fD1",
+    .scope = "test7::"
+  },
+  { "_ZN8nlohmann16json_abi_v3_11_310basic_jsonINSt3__13mapENS2_6vectorENS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbxydS8_NS0_14adl_serializerENS4_IhNS8_IhEEEEvE5parseIRA29_KcEESE_OT_NS2_8functionIFbiNS0_6detail13parse_event_tERSE_EEEbb",
+    { .BasenameLocs = {687, 692}, .ScopeLocs = {343, 687}, .ArgumentLocs = { 713, 1174 } },
+    .basename = "parse",
+    .scope = "nlohmann::json_abi_v3_11_3::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void>::"
+  },
+  { "_ZN8nlohmann16json_abi_v3_11_310basic_jsonINSt3__13mapENS2_6vectorENS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbxydS8_NS0_14adl_serializerENS4_IhNS8_IhEEEEvEC1EDn",
+    { .BasenameLocs = {344, 354}, .ScopeLocs = {0, 344}, .ArgumentLocs = { 354, 370 } },
+    .basename = "basic_json",
+    .scope = "nlohmann::json_abi_v3_11_3::basic_json<std::__1::map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void>::"
+  },
+  { "_Z3fppIiEPFPFvvEiEf",
+    { .BasenameLocs = {10, 13}, .ScopeLocs = {10, 10}, .ArgumentLocs = { 18, 25 } },
+    .basename = "fpp",
+    .scope = ""
+  },
+  { "_Z3fppIiEPFPFvvEN2ns3FooIiEEEf",
+    { .BasenameLocs = {10, 13}, .ScopeLocs = {10, 10}, .ArgumentLocs = { 18, 25 } },
+    .basename = "fpp",
+    .scope = ""
+  },
+  { "_Z3fppIiEPFPFvPFN2ns3FooIiEENS2_3BarIfE3QuxEEEPFS2_S2_EEf",
+    { .BasenameLocs = {10, 13}, .ScopeLocs = {10, 10}, .ArgumentLocs = { 18, 25 } },
+    .basename = "fpp",
+    .scope = ""
+  },
+  { "_ZN2ns8HasFuncsINS_3FooINS1_IiE3BarIfE3QuxEEEE3fppIiEEPFPFvvEiEf",
+    { .BasenameLocs = {64, 67}, .ScopeLocs = {10, 64}, .ArgumentLocs = { 72, 79 } },
+    .basename = "fpp",
+    .scope = "ns::HasFuncs<ns::Foo<ns::Foo<int>::Bar<float>::Qux>>::"
+  },
+  { "_ZN2ns8HasFuncsINS_3FooINS1_IiE3BarIfE3QuxEEEE3fppIiEEPFPFvvES2_Ef",
+    { .BasenameLocs = {64, 67}, .ScopeLocs = {10, 64}, .ArgumentLocs = { 72, 79 } },
+    .basename = "fpp",
+    .scope = "ns::HasFuncs<ns::Foo<ns::Foo<int>::Bar<float>::Qux>>::"
+  },
+  { "_ZN2ns8HasFuncsINS_3FooINS1_IiE3BarIfE3QuxEEEE3fppIiEEPFPFvPFS2_S5_EEPFS2_S2_EEf",
+    { .BasenameLocs = {64, 67}, .ScopeLocs = {10, 64}, .ArgumentLocs = { 72, 79 } },
+    .basename = "fpp",
+    .scope = "ns::HasFuncs<ns::Foo<ns::Foo<int>::Bar<float>::Qux>>::"
+  },
+  { "_ZTV11ImageLoader",
+    { .BasenameLocs = {0, 0}, .ScopeLocs = {0, 0}, .ArgumentLocs = { 0, 0 } },
+    .basename = "",
+    .scope = "",
+    .valid_basename = false
+  }
+    // clang-format on
+};
+
+struct DemanglingPartsTestFixture
+    : public ::testing::TestWithParam<DemanglingPartsTestCase> {};
+
+TEST_P(DemanglingPartsTestFixture, DemanglingParts) {
+  const auto &[mangled, info, basename, scope, valid_basename] = GetParam();
+
+  ManglingParser<TestAllocator> Parser(mangled, mangled + ::strlen(mangled));
+
+  const auto *Root = Parser.parse();
+
+  ASSERT_NE(nullptr, Root);
+
+  OutputBuffer OB;
+  Root->print(OB);
+  auto demangled = toString(OB);
+
+  ASSERT_EQ(OB.FunctionInfo.hasBasename(), valid_basename);
+
+  EXPECT_EQ(OB.FunctionInfo.BasenameLocs, info.BasenameLocs);
+  EXPECT_EQ(OB.FunctionInfo.ScopeLocs, info.ScopeLocs);
+  EXPECT_EQ(OB.FunctionInfo.ArgumentLocs, info.ArgumentLocs);
+
+  auto get_part = [&](const std::pair<size_t, size_t> &loc) {
+    return demangled.substr(loc.first, loc.second - loc.first);
+  };
+
+  EXPECT_EQ(get_part(OB.FunctionInfo.BasenameLocs), basename);
+  EXPECT_EQ(get_part(OB.FunctionInfo.ScopeLocs), scope);
+}
+
+INSTANTIATE_TEST_SUITE_P(DemanglingPartsTests, DemanglingPartsTestFixture,
+                         ::testing::ValuesIn(g_demangling_parts_test_cases));



More information about the llvm-commits mailing list