[lldb-dev] RFC: Moving debug info parsing out of process

Wed Feb 27 14:52:02 PST 2019

> On Feb 27, 2019, at 10:12 AM, Zachary Turner <zturner at google.com> wrote:
> 
> 
> 
> On Tue, Feb 26, 2019 at 5:39 PM Frédéric Riss <friss at apple.com <mailto:friss at apple.com>> wrote:
> 
>> On Feb 26, 2019, at 4:52 PM, Zachary Turner <zturner at google.com <mailto:zturner at google.com>> wrote:
>> 
>> 
>> 
>> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss <friss at apple.com <mailto:friss at apple.com>> wrote:
>> 
>>> On Feb 26, 2019, at 4:03 PM, Zachary Turner <zturner at google.com <mailto:zturner at google.com>> wrote:
>>> 
>>> I would probably build the server by using mostly code from LLVM.  Since it would contain all of the low level debug info parsing libraries, i would expect that all knowledge of debug info (at least, in the form that compilers emit it in) could eventually be removed from LLDB entirely.
>> 
>> That’s quite an ambitious goal.
>> 
>> I haven’t looked at the SymbolFile API, what do you expect the exchange currency between the server and LLDB to be? Serialized compiler ASTs? If that’s the case, it seems like you need a strong rev-lock between the server and the client. Which in turn add quite some complexity to the rollout of new versions of the debugger.
>> Definitely not serialized ASTs, because you could be debugging some language other than C++.  Probably something more like JSON, where you parse the debug info and send back some JSON representation of the type / function / variable the user requested, which can almost be a direct mapping to LLDB's internal symbol hierarchy (e.g. the Function, Type, etc classes).  You'd still need to build the AST on the client
> 
> This seems fairly easy for Function or symbols in general, as it’s easy to abstract their few properties, but as soon as you get to the type system, I get worried.
> 
> Your representation needs to have the full expressivity of the underlying debug info format. Inventing something new in that space seems really expensive. For example, every piece of information we add to the debug info in the compiler would need to be handled in multiple places:
>  - the server code
>  - the client code that talks to the server
>  - the current “local" code (for a pretty long while)
> Not ideal. I wish there was a way to factor at least the last 2. 
> How often does this actually happen though?  The C++ type system hasn't really undergone very many fundamental changes over the years.

I think over the last year we’ve done at least a couple extensions to what we put in DWARF (for ObjC classes and ARM PAC support which is not upstream yet). Adrian usually does those evolutions, so he might have a better idea. We plan on potentially adding a bunch more information to DWARF to more accurately represent the Obj-C type system.  

>   I mocked up a few samples of what some JSON descriptions would look like, and it didn't seem terrible.  It certainly is some work -- there's no denying -- but I think a lot of the "expressivity" of the underlying format is actually more accurately described as "flexibility".  What I mean by this is that there are both many different ways to express the same thing, as well as many entities that can express different things depending on how they're used.  An intermediate format gives us a way to eliminate all of that flexibility and instead offer consistency, which makes client code much simpler.  In a way, this is a similar benefit to what one gets by compiling a source language down to LLVM IR and then operating on the LLVM IR because you have a much simpler grammar to deal with, along with more semantic restrictions on what kind of descriptions you form with that grammar (to be clear: JSON itself is not restrictive, but we can make our schema restrictive).

What I’m worried about is not exactly the amount of work, just the scope of the new abstraction. It needs to be good enough for any language and any debug information format. It needs efficient implementation of at least symbols, types, decl contexts, frame information, location expressions, target register mappings... And it’ll require the equivalent of the various ASTParser implementations. That’s a lot of new and forked code. I’d feel way better if we were able to reuse some of the existing code. I’m not sure how feasible this is though.

> For what it's worth, in an earlier message I mentioned that I would probably build the server by using mostly code from LLVM, and making sure that it supported the union of things currently supported by LLDB and LLVM's DWARF parsers.  Doing that would naturally require merging the two (which has been talked about for a long time) as a pre-requisite, and I would expect that for testing purposes we might want something like llvm-dwarfdump but that dumps a higher level description of the information (if we change our DWARF emission code in LLVM for example, to output the exact same type in slightly different ways in the underlying DWARF, we wouldn't want our test to break, for example).  So for example imagine you could run something like `lldb-dwarfdump -lookup-type=foo a.out` and it would dump some description of the type that is resilient to insignificant changes in the underlying DWARF.

At which level do you consider the “DWARF parser” to stop and the debugger policy to start? In my view, the DWARF parser stop at the DwarfDIE boundary. Replacing it wouldn’t get us closer to a higher-level abstraction.

> At that point you're already 90% of the way towards what I'm proposing, and it's useful independently.

I think that “90%” figure is a little off :-) But please don’t take my questions as opposition to the general idea. I find the idea very interesting, and we could maybe use something similar internally so I am interested. That’s why I’m asking questions.

Fred

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20190227/e383e04a/attachment.html>