[Lldb-commits] [PATCH] D31451: New C++ function name parsing logic

Fri Mar 31 09:12:11 PDT 2017

labath added a comment.

In https://reviews.llvm.org/D31451#714850, @eugene wrote:

> I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s) 
>  clang::Lexer appears to be the slowest part of it.
>
> On the clang breakpoint benchmark you proposed, it is hard to notice much of a difference. clang binary has about 500k functions.
>  With new parser it takes about 11s to try to set a breakpoint, on the old one it's about 10s. (release version of  LLDB, debug static version of clang)
>
> I decided to use a hybrid approach, when we use old parsing code for simple cases and call new parser only when it fails.
>  80% of clang functions are simple enough that we don't really need the new parser, so it helped to bring clang breakpoint test back to 10s.
>  I think it's reasonable to assume that similar distribution is true for most programs, and most of their functions can be parsed with the old code.
>
> I don't think we can really improve performance of a new parser without giving up on clang::Lexer, and I'm reluctant to do it. 
>  So I propose to keep hybrid approach and call a new parser only for complicated cases.

I like that idea. Let's go with that.

================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:94
   if (!m_parsed && m_full) {
-    //        ConstString mangled;
-    //        m_full.GetMangledCounterpart(mangled);
-    //        printf ("\n   parsing = '%s'\n", m_full.GetCString());
-    //        if (mangled)
-    //            printf ("   mangled = '%s'\n", mangled.GetCString());
-    m_parse_error = false;
-    m_parsed = true;
-    llvm::StringRef full(m_full.GetCString());
-
-    size_t arg_start, arg_end;
-    llvm::StringRef parens("()", 2);
-    if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) {
-      m_arguments = full.substr(arg_start, arg_end - arg_start + 1);
-      if (arg_end + 1 < full.size())
-        m_qualifiers = full.substr(arg_end + 1);
-      if (arg_start > 0) {
-        size_t basename_end = arg_start;
-        size_t context_start = 0;
-        size_t context_end = llvm::StringRef::npos;
-        if (basename_end > 0 && full[basename_end - 1] == '>') {
-          // TODO: handle template junk...
-          // Templated function
-          size_t template_start, template_end;
-          llvm::StringRef lt_gt("<>", 2);
-          if (ReverseFindMatchingChars(full, lt_gt, template_start,
-                                       template_end, basename_end)) {
-            // Check for templated functions that include return type like:
-            // 'void foo<Int>()'
-            context_start = full.rfind(' ', template_start);
-            if (context_start == llvm::StringRef::npos)
-              context_start = 0;
-            else
-              ++context_start;
-
-            context_end = full.rfind(':', template_start);
-            if (context_end == llvm::StringRef::npos ||
-                context_end < context_start)
-              context_end = context_start;
-          } else {
-            context_end = full.rfind(':', basename_end);
-          }
-        } else if (context_end == llvm::StringRef::npos) {
-          context_end = full.rfind(':', basename_end);
-        }
-
-        if (context_end == llvm::StringRef::npos)
-          m_basename = full.substr(0, basename_end);
-        else {
-          if (context_start < context_end)
-            m_context =
-                full.substr(context_start, context_end - 1 - context_start);
-          const size_t basename_begin = context_end + 1;
-          m_basename =
-              full.substr(basename_begin, basename_end - basename_begin);
-        }
-        m_type = eTypeUnknownMethod;
-      } else {
-        m_parse_error = true;
-        return;
-      }
-
-      if (!IsValidBasename(m_basename)) {
-        // The C++ basename doesn't match our regular expressions so this can't
-        // be a valid C++ method, clear everything out and indicate an error
-        m_context = llvm::StringRef();
-        m_basename = llvm::StringRef();
-        m_arguments = llvm::StringRef();
-        m_qualifiers = llvm::StringRef();
-        m_parse_error = true;
-      }
+    CPlusPlusNameParser parser(m_full.GetStringRef());
+    auto function = parser.ParseAsFunctionDefinition();
----------------
eugene wrote:
> labath wrote:
> > How about the following api:
> > ```
> > if (auto function = CPlusPlusNameParser::ParseAsFunctionDefinition(m_full.GetStringRef())) {
> >   ...
> > ```
> If you don't mind I'll leave it as it is. 
> 
> I understand that it's very tempting to have two simple functions ParseAsFunctionDefinition and ParseAsFullName instead of a class, but I can imagine calling second one if first one fails, and in this case it'll be good that parser doesn't need to tokenize string all over again.
Ok, that makes sense -- I didn't expect that would actually work.
I suppose one can always write `CPlusPlusNameParser(foo).ParseAsFunctionDefinition()` when he wants to make it a one-liner and doesn't care about tokenization reuse.

================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:62
+bool CPlusPlusNameParser::HasMoreTokens() {
+  return m_next_token_index < static_cast<int>(m_tokens.size());
+}
----------------
eugene wrote:
> labath wrote:
> > Wouldn't it be better to change the type of m_next_token_index to size_t?
> Unsigned types are dangerous. I prefer to avoid them wherever I can.
> 
> As far as I remember many members of C++ standard commit publicly regretted the fact that .size() returns an unsigned type.
Well... they are most dangerous when you try to combine them with signed types, which is exactly what you are doing now... :) It's also contrary to how things are done in other parts of the code base and goes against the principle of having as few nonsensical values for your variables as possible. So, I still think this (and all other variables you use for token indexes) should be size_t.

================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:18
+using namespace lldb_private;
+using llvm::Optional;
+using llvm::None;
----------------
Are these necessary? You seem to prefix every occurence of Optional and None anyway...

================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:257
+  if (HasMoreTokens() && Peek().is(tok::raw_identifier) &&
+      Peek().getRawIdentifier() == g_anonymous.GetStringRef()) {
+    Advance();
----------------
You don't need to go `ConstString` here. If you wanted to avoid strlen computation, just make this `constexpr StringLiteral`. 

================
Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h:94
+  // when destructed unless it's manually removed with Remove().
+  class Bookmark {
+  public:
----------------
eugene wrote:
> labath wrote:
> > Please make the type move-only. Otherwise you will have a fun time debugging accidental copies. (You already have one, although it is benign now)
> Good catch! Thanks!
Please also delete the assignment operator.. Having copy constructor deleted and assignment operator working is confusing,

https://reviews.llvm.org/D31451