[lldb-dev] C++ method declaration parsing

Wed Mar 15 16:48:14 PDT 2017

Hi, Everyone.

Current implementation of CPlusPlusLanguage::MethodName::Parse() doesn't
cover full extent of possible function declarations,
or even declarations returned by abi::__cxa_demangle.

Consider this code:
--------------------------------------------------

#include <stdio.h>
#include <functional>
#include <vector>

void func() {
  printf("func() was called\n");
}

struct Class
{
  Class() {
    printf("ctor was called\n");
  }

  Class(const Class& c) {
    printf("copy ctor was called\n");
  }

  ~Class() {
    printf("dtor was called\n");
  }
};

int main() {
  std::function<void()> f = func;
  f();

  Class c;
  std::vector<Class> v;
  v.push_back(c);

  return 0;
}

--------------------------------------------------

When compiled It has at least two symbols that currently cannot be
correctly parsed by MethodName::Parse() .

void std::vector<Class, std::allocator<Class>
>::_M_emplace_back_aux<Class const&>(Class const&)
void (* const&std::_Any_data::_M_access<void (*)()>() const)() - a
template function that returns a reference to a function pointer.

It causes incorrect behavior in avoid-stepping and sometimes messes
printing of thread backtrace.

I would like to solve this issue, but current implementation of method name
parsing doesn't seem sustainable.
Clever substrings and regexs are fine for trivial cases, but they become a
nightmare once we consider more complex cases.
That's why I'd like to have code that follows some kind of grammar
describing function declarations.

As I see it, choices for new implementation of MethodName::Parse() are
1. Reuse clang parsing code.
2. Parser generated by bison.
3. Handwritten recursive descent parser.

I looked at the option #1, at it appears to be impossible to reuse clang
parser for this kind of zero-context parsing.
Especially given that we care about performance of this code. Clang C++
lexer on the other hand can be reused.

Option #2. Using bison is tempting, but it would require introduction of
new compile time dependency.
That might be especially inconvenient on Windows.

That's why I think option #3 is the way to go. Recursive descent parser
that reuses a C++ lexer from clang.

LLDB doesn't need to parse everything (e.g. we don't care about details of
function arguments), but it needs to be able to handle tricky return types
and base names.
Eventually new implementation should be able to parse signature of every
method generated by STL.

Before starting implementation, I'd love to get some feedback. It might be
that my overlooking something important.

-- 
Thanks,
Eugene Zemtsov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170315/70b9a69e/attachment.html>