[lldb-dev] Module Cache improvements - RFC

Thu Jan 28 04:21:30 PST 2016

Hello all,

we are running into limitations of the current module download/caching
system. A simple android application can link to about 46 megabytes
worth of modules, and downloading that with our current transfer rates
takes about 25 seconds. Much of the data we download this way is never
actually accessed, and yet we download everything immediately upon
starting the debug session, which makes the first session extremely
laggy.

We could speed up a lot by only downloading the portions of the module
that we really need (in my case this turns out to be about 8
megabytes). Also, further speedups could be made by increasing the
throughput of the gdb-remote protocol used for downloading these files
by using pipelining.

I made a proof-of-concept hack  of these things, put it into lldb and
I was able to get the time for the startup-attach-detach-exit cycle
down to 5.4 seconds (for comparison, the current time for the cycle is
about 3.6 seconds with a hot module cache, and 28(!) seconds with an
empty cache).

Now, I would like to properly implement these things in lldb properly,
so this is a request for comments on my plan. What I would like to do
is:
- Replace ModuleCache with a SectionCache (actually, more like a cache
of arbitrary file chunks). When a the cache gets a request for a file
and the file is not in the cache already, it returns a special kind of
a Module, whose fragments will be downloaded as we are trying to
access them. These fragments will be cached on disk, so that
subsequent requests for the file do not need to re-download them. We
can also have the option to short-circuit this logic and download the
whole file immediately (e.g., when the file is small, or we have a
super-fast way of obtaining the whole file via rsync, etc...)
- Add pipelining support to GDBRemoteCommunicationClient for
communicating with the platform. This actually does not require any
changes to the wire protocol. The only change is in adding the ability
to send an additional request to the server while waiting for the
response to the previous one. Since the protocol is request-response
based and we are communication over a reliable transport stream, each
response can be correctly matched to a request even though we have
multiple packets in flight. Any packets which need to maintain more
complex state (like downloading a single entity using continuation
packets) can still lock the stream to get exclusive access, but I am
not sure if we actually even have any such packets in the platform
flavour of the protocol.
- Paralelize downloading of multiple files in parallel, utilizing
request pipelining. Currently we get the biggest delay when first
attaching to a process (we download file headers and some basic
informative sections) and when we try to set the first symbol-level
breakpoint (we download symbol tables and string sections). Both of
these actions operate on all modules in bulk, which makes them easy
paralelization targets. This will provide a big speed boost, as we
will be eliminating communication latency. Furthermore, in case of
lots of files, we will be overlapping file download  (io) with parsing
(cpu), for an even bigger boost.

What do you think?

cheers,
pl