<div dir="ltr">A random idea: Instead of parsing demangled C++ method names what people think about writing or reusing a demangler what can gave back both the demangled name and the parsed name in some form?<div><br></div><div>My guess is that it would be both more efficient (we already have most of information during demangling) and possibly easier to implement as I expect less edge cases. Additionally I think it would be a nice library to have as part of the LLVM project.<div><br></div><div>Tamas</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Mar 16, 2017 at 2:43 AM Eugene Zemtsov via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org">lldb-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif">Yes, it's a good idea to add cfe-dev.</div><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif">It is totally possible that I overlooked something and clang can help with this kind of superficial parsing. <br class="gmail_msg"></div><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif"><br class="gmail_msg"></div><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif">As far as I can see even clang-format does it's own parsing (UnwrappedLineParser.cpp) and clang-format has very similar need of roughly understanding of code without knowing any context.</div></div><div dir="ltr" class="gmail_msg"><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif"><br class="gmail_msg"></div>> are you certain that clang's parser would be unacceptably slow?<div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif"><br class="gmail_msg"></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif">I don't have any perf numbers to back it up, but it does look like a lot of clang infrastructure needs to be set up before actual parsing begins. (see lldb_private::ClangExpressionParser). It's not important though, as at this stage I don't see how we can reuse clang at all.</div><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif"><br class="gmail_msg"></div><div class="gmail_default gmail_msg" style="font-family:verdana,sans-serif"><br class="gmail_msg"></div></div><div class="gmail_extra gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg">On Wed, Mar 15, 2017 at 5:03 PM, Zachary Turner <span dir="ltr" class="gmail_msg"><<a href="mailto:zturner@google.com" class="gmail_msg" target="_blank">zturner@google.com</a>></span> wrote:<br class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">If there is any way to re-use clang parser for this, it would be wonderful. Even if it means adding support to clang for whatever you need in order to make it possible. You mention performance, are you certain that clang's parser would be unacceptably slow?<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">+cfe-dev as they may have some more input on what it would take to extend clang to make this possible.<br class="gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><div class="m_-2317729888908281933h5 gmail_msg"><div dir="ltr" class="gmail_msg">On Wed, Mar 15, 2017 at 4:48 PM Eugene Zemtsov via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" class="gmail_msg" target="_blank">lldb-dev@lists.llvm.org</a>> wrote:<br class="gmail_msg"></div></div></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_msg"><div class="m_-2317729888908281933h5 gmail_msg"><div dir="ltr" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Hi, Everyone.</font><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Current implementation of CPlusPlusLanguage::MethodName::Parse() doesn't cover full extent of possible function declarations, </font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">or even declarations returned by abi::__cxa_demangle. </font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Consider this code:</font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">--------------------------------------------------<br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="gmail_default m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><pre style="font-size:11.726px;white-space:pre-wrap;max-width:80em;padding-left:0.7em;color:rgb(0,0,0)" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">#include <stdio.h>
#include <functional>
#include <vector>
void func() {
printf("func() was called\n");
}
struct Class
{
Class() {
printf("ctor was called\n");
}
Class(const Class& c) {
printf("copy ctor was called\n");
}
~Class() {
printf("dtor was called\n");
}
};
int main() {
std::function<void()> f = func;
f();
Class c;
std::vector<Class> v;
v.push_back(c);
return 0;
}
</font></pre><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">--------------------------------------------------<br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">When compiled It has at least two symbols that currently cannot be correctly parsed by MethodName::Parse() .</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><pre style="font-size:11.726px;white-space:pre-wrap;max-width:80em;padding-left:0.7em;color:rgb(0,0,0)" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">void std::vector<Class, std::allocator<Class> >::_M_emplace_back_aux<Class const&>(Class const&)
void (* const&std::_Any_data::_M_access<void (*)()>() const)() - a template function that returns a reference to a function pointer.
</font></pre></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">It causes incorrect behavior in avoid-stepping and sometimes messes printing of thread backtrace.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">I would like to solve this issue, but current implementation of method name parsing doesn't seem sustainable. </font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Clever substrings and regexs are fine for trivial cases, but they become a nightmare once we consider more complex cases.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">That's why I'd like to have code that follows some kind of grammar describing function declarations.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">As I see it, choices for new implementation of </font><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">MethodName::Parse() </span><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">are</span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">1. Reuse clang parsing code.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">2. Parser generated by bison.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">3. Handwritten recursive descent parser.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">I looked at the option #1, at it appears to be impossible to reuse clang parser for this kind of zero-context parsing. </font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Especially given that we care about performance of this code. </font><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Clang C++ lexer on the other hand can be reused.</span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Option #2. Using bison is tempting, but it would require introduction of new compile time dependency. </span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">That might be especially inconvenient on Windows.</span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">That's why I think option #3 is the way to go. Recursive descent parser that reuses a C++ lexer from clang. </span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">LLDB doesn't need to parse everything (</span><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">e.g. we don't care about details of function arguments), but it needs to be able to handle tricky return types and base names.</span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Eventually new implementation should be able to parse signature of every method generated by STL. </font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><span style="font-family:verdana,sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></span></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Before starting implementation, I'd love to get some feedback. It might be that my overlooking something important.</font></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></div></div><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">-- <br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"></font></div><div class="m_-2317729888908281933m_1554989480439670304m_4700145260659919425gmail-m_-1339871576848199557gmail_signature m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><div dir="ltr" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Thanks,</font><div class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg"><font face="verdana, sans-serif" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">Eugene Zemtsov.</font></div></div></div>
</div></div></div>
_______________________________________________<br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">
lldb-dev mailing list<br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">
<a href="mailto:lldb-dev@lists.llvm.org" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg" target="_blank">lldb-dev@lists.llvm.org</a><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" rel="noreferrer" class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br class="m_-2317729888908281933m_1554989480439670304gmail_msg gmail_msg">
</blockquote></div></div></div>
</blockquote></div><br class="gmail_msg"><br clear="all" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div>-- <br class="gmail_msg"><div class="m_-2317729888908281933gmail_signature gmail_msg" data-smartmail="gmail_signature"><div dir="ltr" class="gmail_msg"><font face="verdana, sans-serif" class="gmail_msg">Thanks,</font><div class="gmail_msg"><font face="verdana, sans-serif" class="gmail_msg">Eugene Zemtsov.</font></div></div></div>
</div>
_______________________________________________<br class="gmail_msg">
lldb-dev mailing list<br class="gmail_msg">
<a href="mailto:lldb-dev@lists.llvm.org" class="gmail_msg" target="_blank">lldb-dev@lists.llvm.org</a><br class="gmail_msg">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev" rel="noreferrer" class="gmail_msg" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev</a><br class="gmail_msg">
</blockquote></div>