[Lldb-commits] [PATCH] D31451: New C++ function name parsing logic
Eugene Zemtsov via Phabricator via lldb-commits
lldb-commits at lists.llvm.org
Thu Mar 30 18:42:14 PDT 2017
eugene marked 4 inline comments as done.
eugene added a comment.
> I think we should do some performance measurements to see whether this needs more optimising or it's fine as is.
>
> I propose the following benchmark:
>
> bin/lldb bin/clang
>
> make sure clang is statically linked
> breakpoint set --name non_exisiting_function
>
> Setting the breakpoint on a nonexistent function avoids us timing the breakpoint setting machinery, while still getting every symbol in the executable parsed.
I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s)
clang::Lexer appears to be the slowest part of it.
On the clang breakpoint benchmark you proposed, it is hard to notice much of a difference. clang binary has about 500k functions.
With new parser it takes about 11s to try to set a breakpoint, on the old one it's about 10s. (release version of LLDB, debug static version of clang)
I decided to use a hybrid approach, when we use old parsing code for simple cases and call new parser only when it fails.
80% of clang functions are simple enough that we don't really need the new parser, so it helped to bring clang breakpoint test back to 10s.
I think it's reasonable to assume that similar distribution is true for most programs, and most of their functions can be parsed with the old code.
I don't think we can really improve performance of a new parser without giving up on clang::Lexer, and I'm reluctant to do it.
So I propose to keep hybrid approach and call a new parser only for complicated cases.
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:94
if (!m_parsed && m_full) {
- // ConstString mangled;
- // m_full.GetMangledCounterpart(mangled);
- // printf ("\n parsing = '%s'\n", m_full.GetCString());
- // if (mangled)
- // printf (" mangled = '%s'\n", mangled.GetCString());
- m_parse_error = false;
- m_parsed = true;
- llvm::StringRef full(m_full.GetCString());
-
- size_t arg_start, arg_end;
- llvm::StringRef parens("()", 2);
- if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) {
- m_arguments = full.substr(arg_start, arg_end - arg_start + 1);
- if (arg_end + 1 < full.size())
- m_qualifiers = full.substr(arg_end + 1);
- if (arg_start > 0) {
- size_t basename_end = arg_start;
- size_t context_start = 0;
- size_t context_end = llvm::StringRef::npos;
- if (basename_end > 0 && full[basename_end - 1] == '>') {
- // TODO: handle template junk...
- // Templated function
- size_t template_start, template_end;
- llvm::StringRef lt_gt("<>", 2);
- if (ReverseFindMatchingChars(full, lt_gt, template_start,
- template_end, basename_end)) {
- // Check for templated functions that include return type like:
- // 'void foo<Int>()'
- context_start = full.rfind(' ', template_start);
- if (context_start == llvm::StringRef::npos)
- context_start = 0;
- else
- ++context_start;
-
- context_end = full.rfind(':', template_start);
- if (context_end == llvm::StringRef::npos ||
- context_end < context_start)
- context_end = context_start;
- } else {
- context_end = full.rfind(':', basename_end);
- }
- } else if (context_end == llvm::StringRef::npos) {
- context_end = full.rfind(':', basename_end);
- }
-
- if (context_end == llvm::StringRef::npos)
- m_basename = full.substr(0, basename_end);
- else {
- if (context_start < context_end)
- m_context =
- full.substr(context_start, context_end - 1 - context_start);
- const size_t basename_begin = context_end + 1;
- m_basename =
- full.substr(basename_begin, basename_end - basename_begin);
- }
- m_type = eTypeUnknownMethod;
- } else {
- m_parse_error = true;
- return;
- }
-
- if (!IsValidBasename(m_basename)) {
- // The C++ basename doesn't match our regular expressions so this can't
- // be a valid C++ method, clear everything out and indicate an error
- m_context = llvm::StringRef();
- m_basename = llvm::StringRef();
- m_arguments = llvm::StringRef();
- m_qualifiers = llvm::StringRef();
- m_parse_error = true;
- }
+ CPlusPlusNameParser parser(m_full.GetStringRef());
+ auto function = parser.ParseAsFunctionDefinition();
----------------
labath wrote:
> How about the following api:
> ```
> if (auto function​ = CPlusPlusNameParser::ParseAsFunctionDefinition(m_full.GetStringRef())) {
> ...
> ```
If you don't mind I'll leave it as it is.
I understand that it's very tempting to have two simple functions ParseAsFunctionDefinition and ParseAsFullName instead of a class, but I can imagine calling second one if first one fails, and in this case it'll be good that parser doesn't need to tokenize string all over again.
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:160
+ auto full_name = parser.ParseAsFullName();
+ if (full_name.hasValue()) {
+ identifier = full_name.getValue().m_basename;
----------------
labath wrote:
> Same here
see above
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:62
+bool CPlusPlusNameParser::HasMoreTokens() {
+ return m_next_token_index < static_cast<int>(m_tokens.size());
+}
----------------
labath wrote:
> Wouldn't it be better to change the type of m_next_token_index to size_t?
Unsigned types are dangerous. I prefer to avoid them wherever I can.
As far as I remember many members of C++ standard commit publicly regretted the fact that .size() returns an unsigned type.
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:188
+ // as arithmetic or shift operators not as template brackets.
+ // Examples: std::enable_if<(10u)<(64), bool>
+ // f<A<operator<(X,Y)::Subclass>>
----------------
labath wrote:
> Is this really the case for the types we are interested in? I would have hoped that this would get simplified to `std::enable_if<true, bool> before it reaches us?
Yes. I was very surprised as well. But apparently compiler sometimes is being lazy and doesn't simplify everything :(
Running:
~$ nm -C --defined-only clang | grep std::enable_if
Gives a few symbols like this:
std::enable_if<(10u)<(64), bool>::type llvm::isUInt<10u>(unsigned long)
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:572
+ clang::SourceLocation end_location = last_token.getLocation();
+ const char *begin_ptr = m_text.data() + begin_location.getRawEncoding();
+ const char *end_ptr =
----------------
labath wrote:
> Could this be written as:
> `m_text.take_front(end_pos).drop_front(start_pos)`?
Thanks! I still have a lot to learn about llvm classes.
================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h:94
+ // when destructed unless it's manually removed with Remove().
+ class Bookmark {
+ public:
----------------
labath wrote:
> Please make the type move-only. Otherwise you will have a fun time debugging accidental copies. (You already have one, although it is benign now)
Good catch! Thanks!
================
Comment at: unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp:106
+ EXPECT_EQ(test.qualifiers, method.GetQualifiers())
+ << method.GetQualifiers();
EXPECT_EQ(test.scope_qualified_name, method.GetScopeQualifiedName());
----------------
labath wrote:
> Would defining operator<< for std::ostream and StringRef ( local tomthis test) enable you to get rid of these?
> I've ran into this before but was never annoyed enough to actually do it.. :/
I tried to to it, but it didn't work for me after a few tries.
I decided to just convert it to std::string, it solves the same problem.
https://reviews.llvm.org/D31451
More information about the lldb-commits
mailing list