[Lldb-commits] [lldb] [lldb][CPlusPlus] Always use CPlusPlusNameParser for parsing C++ function names (PR #137072)
via lldb-commits
lldb-commits at lists.llvm.org
Wed Apr 23 15:11:06 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-lldb
Author: Michael Buch (Michael137)
<details>
<summary>Changes</summary>
When doing function lookup in an LLDB module by name, we may need to hand-parse C++ method names to determine properties of the function name such as its "basename". LLDB currently has two ways of parsing methods:
1. Using `TrySimplifiedParse`, which was historically the go-to parser
2. Using the `CPlusPlusNameParser`, which is the more recent parser and has more knowledge of C++ syntax
When `CPlusPlusNameParser` was introduced (https://reviews.llvm.org/D31451), `TrySimplifiedParse` was kept around as the first parser to keep performance in check. Though the performance difference didn't seem very convincing from the numbers provided. 11s vs 10s when setting a breakpoint in a project with 500k functions. The main difference being that we're invoking the clang::Lexer in the new parser.
It would be nice to just get rid of it entirely. There's an edge-case for which I added a test-case where the `TrySimplifiedParse` code-path succeeds but incorrectly deduces the basename of `operator()`.
Happy to do benchmarking to see what the performance impact of this is these days.
---
Full diff: https://github.com/llvm/llvm-project/pull/137072.diff
2 Files Affected:
- (modified) lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp (+9-83)
- (modified) lldb/test/API/lang/cpp/operators/main.cpp (+2)
``````````diff
diff --git a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
index 40b6563aeb410..f8b53c0837a54 100644
--- a/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
+++ b/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
@@ -173,34 +173,6 @@ static bool ReverseFindMatchingChars(const llvm::StringRef &s,
return false;
}
-static bool IsTrivialBasename(const llvm::StringRef &basename) {
- // Check that the basename matches with the following regular expression
- // "^~?([A-Za-z_][A-Za-z_0-9]*)$" We are using a hand written implementation
- // because it is significantly more efficient then using the general purpose
- // regular expression library.
- size_t idx = 0;
- if (basename.starts_with('~'))
- idx = 1;
-
- if (basename.size() <= idx)
- return false; // Empty string or "~"
-
- if (!std::isalpha(basename[idx]) && basename[idx] != '_')
- return false; // First character (after removing the possible '~'') isn't in
- // [A-Za-z_]
-
- // Read all characters matching [A-Za-z_0-9]
- ++idx;
- while (idx < basename.size()) {
- if (!std::isalnum(basename[idx]) && basename[idx] != '_')
- break;
- ++idx;
- }
-
- // We processed all characters. It is a vaild basename.
- return idx == basename.size();
-}
-
/// Writes out the function name in 'full_name' to 'out_stream'
/// but replaces each argument type with the variable name
/// and the corresponding pretty-printed value
@@ -235,66 +207,20 @@ static bool PrettyPrintFunctionNameWithArgs(Stream &out_stream,
return true;
}
-bool CPlusPlusLanguage::CxxMethodName::TrySimplifiedParse() {
- // This method tries to parse simple method definitions which are presumably
- // most comman in user programs. Definitions that can be parsed by this
- // function don't have return types and templates in the name.
- // A::B::C::fun(std::vector<T> &) const
- size_t arg_start, arg_end;
- llvm::StringRef full(m_full.GetCString());
- llvm::StringRef parens("()", 2);
- if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) {
- m_arguments = full.substr(arg_start, arg_end - arg_start + 1);
- if (arg_end + 1 < full.size())
- m_qualifiers = full.substr(arg_end + 1).ltrim();
-
- if (arg_start == 0)
- return false;
- size_t basename_end = arg_start;
- size_t context_start = 0;
- size_t context_end = full.rfind(':', basename_end);
- if (context_end == llvm::StringRef::npos)
- m_basename = full.substr(0, basename_end);
- else {
- if (context_start < context_end)
- m_context = full.substr(context_start, context_end - 1 - context_start);
- const size_t basename_begin = context_end + 1;
- m_basename = full.substr(basename_begin, basename_end - basename_begin);
- }
-
- if (IsTrivialBasename(m_basename)) {
- return true;
- } else {
- // The C++ basename doesn't match our regular expressions so this can't
- // be a valid C++ method, clear everything out and indicate an error
- m_context = llvm::StringRef();
- m_basename = llvm::StringRef();
- m_arguments = llvm::StringRef();
- m_qualifiers = llvm::StringRef();
- m_return_type = llvm::StringRef();
- return false;
- }
- }
- return false;
-}
-
void CPlusPlusLanguage::CxxMethodName::Parse() {
if (!m_parsed && m_full) {
- if (TrySimplifiedParse()) {
+ CPlusPlusNameParser parser(m_full.GetStringRef());
+ if (auto function = parser.ParseAsFunctionDefinition()) {
+ m_basename = function->name.basename;
+ m_context = function->name.context;
+ m_arguments = function->arguments;
+ m_qualifiers = function->qualifiers;
+ m_return_type = function->return_type;
m_parse_error = false;
} else {
- CPlusPlusNameParser parser(m_full.GetStringRef());
- if (auto function = parser.ParseAsFunctionDefinition()) {
- m_basename = function->name.basename;
- m_context = function->name.context;
- m_arguments = function->arguments;
- m_qualifiers = function->qualifiers;
- m_return_type = function->return_type;
- m_parse_error = false;
- } else {
- m_parse_error = true;
- }
+ m_parse_error = true;
}
+
if (m_context.empty()) {
m_scope_qualified = std::string(m_basename);
} else {
diff --git a/lldb/test/API/lang/cpp/operators/main.cpp b/lldb/test/API/lang/cpp/operators/main.cpp
index c52ef1c8cac47..139c16b02ffe9 100644
--- a/lldb/test/API/lang/cpp/operators/main.cpp
+++ b/lldb/test/API/lang/cpp/operators/main.cpp
@@ -174,6 +174,8 @@ int main(int argc, char **argv) {
//% self.expect("expr (new struct C[1]); side_effect", endstr=" = 4\n")
//% self.expect("expr delete c2; side_effect", endstr=" = 1\n")
//% self.expect("expr delete[] c3; side_effect", endstr=" = 2\n")
+ //% self.expect("image lookup -n operator()", substrs=["C::operator()(int)"])
+ //% self.expect("image lookup -n C::operator()", substrs=["C::operator()(int)"])
delete c2;
delete[] c3;
return 0;
``````````
</details>
https://github.com/llvm/llvm-project/pull/137072
More information about the lldb-commits
mailing list