[lldb-dev] RFC: Moving debug info parsing out of process

Zachary Turner via lldb-dev lldb-dev at lists.llvm.org
Wed Feb 27 10:12:32 PST 2019


On Tue, Feb 26, 2019 at 5:39 PM Frédéric Riss <friss at apple.com> wrote:

>
> On Feb 26, 2019, at 4:52 PM, Zachary Turner <zturner at google.com> wrote:
>
>
>
> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss <friss at apple.com> wrote:
>
>>
>> On Feb 26, 2019, at 4:03 PM, Zachary Turner <zturner at google.com> wrote:
>>
>> I would probably build the server by using mostly code from LLVM.  Since
>> it would contain all of the low level debug info parsing libraries, i would
>> expect that all knowledge of debug info (at least, in the form that
>> compilers emit it in) could eventually be removed from LLDB entirely.
>>
>>
>> That’s quite an ambitious goal.
>>
>> I haven’t looked at the SymbolFile API, what do you expect the exchange
>> currency between the server and LLDB to be? Serialized compiler ASTs? If
>> that’s the case, it seems like you need a strong rev-lock between the
>> server and the client. Which in turn add quite some complexity to the
>> rollout of new versions of the debugger.
>>
> Definitely not serialized ASTs, because you could be debugging some
> language other than C++.  Probably something more like JSON, where you
> parse the debug info and send back some JSON representation of the type /
> function / variable the user requested, which can almost be a direct
> mapping to LLDB's internal symbol hierarchy (e.g. the Function, Type, etc
> classes).  You'd still need to build the AST on the client
>
>
> This seems fairly easy for Function or symbols in general, as it’s easy to
> abstract their few properties, but as soon as you get to the type system, I
> get worried.
>
> Your representation needs to have the full expressivity of the underlying
> debug info format. Inventing something new in that space seems really
> expensive. For example, every piece of information we add to the debug info
> in the compiler would need to be handled in multiple places:
>  - the server code
>  - the client code that talks to the server
>  - the current “local" code (for a pretty long while)
> Not ideal. I wish there was a way to factor at least the last 2.
>
How often does this actually happen though?  The C++ type system hasn't
really undergone very many fundamental changes over the years.  I mocked up
a few samples of what some JSON descriptions would look like, and it didn't
seem terrible.  It certainly is some work -- there's no denying -- but I
think a lot of the "expressivity" of the underlying format is actually more
accurately described as "flexibility".  What I mean by this is that there
are both many different ways to express the same thing, as well as many
entities that can express different things depending on how they're used.
An intermediate format gives us a way to eliminate all of that flexibility
and instead offer consistency, which makes client code much simpler.  In a
way, this is a similar benefit to what one gets by compiling a source
language down to LLVM IR and then operating on the LLVM IR because you have
a much simpler grammar to deal with, along with more semantic restrictions
on what kind of descriptions you form with that grammar (to be clear: JSON
itself is not restrictive, but we can make our schema restrictive).

For what it's worth, in an earlier message I mentioned that I would
probably build the server by using mostly code from LLVM, and making sure
that it supported the union of things currently supported by LLDB and
LLVM's DWARF parsers.  Doing that would naturally require merging the two
(which has been talked about for a long time) as a pre-requisite, and I
would expect that for testing purposes we might want something like
llvm-dwarfdump but that dumps a higher level description of the information
(if we change our DWARF emission code in LLVM for example, to output the
exact same type in slightly different ways in the underlying DWARF, we
wouldn't want our test to break, for example).  So for example imagine you
could run something like `lldb-dwarfdump -lookup-type=foo a.out` and it
would dump some description of the type that is resilient to insignificant
changes in the underlying DWARF.

At that point you're already 90% of the way towards what I'm proposing, and
it's useful independently.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20190227/46066071/attachment.html>


More information about the lldb-dev mailing list