[lldb-dev] RFC: Moving debug info parsing out of process

Mon Feb 25 10:21:29 PST 2019

Hi all,

We've got some internal efforts in progress, and one of those would benefit
from debug info parsing being out of process (independently of whether or
not the rest of LLDB is out of process).

There's a couple of advantages to this, which I'll enumerate here:

   - It improves one source of instability in LLDB which has been known to
   be problematic -- specifically, that debug info can be bad and handling
   this can often be difficult and bring down the entire debug session.  While
   other efforts have been made to address stability by moving things out of
   process, they have not been upstreamed, and even if they had I think we
   would still want this anyway, for reasons that follow.
   - It becomes theoretically possible to move debug info parsing not just
   to another process, but to another machine entirely.  In a broader sense,
   this decouples the physical debug info location (and for that matter,
   representation) from the debugger host.
   - It becomes testable as an independent component, because you can just
   send requests to it and dump the results and see if they make sense.
   Currently there is almost zero test coverage of this aspect of LLDB apart
   from what you can get after going through many levels of indirection via
   spinning up a full debug session and doing things that indirectly result in
   symbol queries.

The big win here, at least from my point of view, is the second one.
Traditional symbol servers operate by copying entire symbol files (DSYM,
DWP, PDB) from some machine to the debugger host.  These can be very large
-- we've seen 12+ GB in some cases -- which ranges from "slow bandwidth
hog" to "complete non-starter" depending on the debugger host and network.
In this kind of scenario, one could theoretically run the debug info
process on the same NAS, cloud, or whatever as the symbol server.  Then,
rather than copying over an entire symbol file, it responds only to the
query you issued -- if you asked for a type, it just returns a packet
describing the type you requested.

The API itself would be stateless (so that you could make queries for
multiple targets in any order) as well as asynchronous (so that responses
might arrive out of order).  Blocking could be implemented in LLDB, but
having the server be asynchronous means multiple clients could connect to
the same server instance.  This raises interesting possibilities.  For
example, one can imagine thousands of developers connecting to an internal
symbol server on the network and being able to debug remote processes or
core dumps over slow network connections or on machines with very little
storage (e.g. chromebooks).

On the LLDB side, all of this is hidden behind the SymbolFile interface, so
most of LLDB doesn't have to change at all.   While this is in development,
we could have SymbolFileRemote and keep the existing local codepath the
default, until such time that it's robust and complete enough that we can
switch the default.

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20190225/0a46c187/attachment-0001.html>