[lldb-dev] Dlopen extremely slow while LLDB is attached

Tue May 1 03:44:24 PDT 2018

On Mon, 30 Apr 2018 at 20:13, Scott Funkenhauser <sfunkenhauser at google.com>
wrote:

> I messed up my test setup and incorrectly attributed the additional time.
Parsing the DWARF sections does add additional time, but only increases by
a small percentage.

> By pure chance I noticed that the latency between the client and server
had a huge impact. I did some tests against lldb_server running on a
machine with a RTT of 18ms to the client. The difference in load time in my
real world example (initializing Vulkan) was around 16x slower (~3.5s vs
55s). I did some more digging and it looks like there are certain
operations that perform a number of synchronous requests to the server
(DYLDRendezvous::TakeSnapshot - updating SO entries and
ThreadList::WillResume - grabbing SP and FP for every thread). Since all
the requests are synchronous they are extremely sensitive to increased
latency.

> Is this a limitation of the gdb-server (can't handle parallel requests)?
> Or is this not a common use case, and is not a known issue?

This is a known issue, though I did not expect it to have that much of an
impact. In fact, I have trouble reconciling this fact with your earlier
statement that second and subsequent runs are much faster. The SO entry
loading is something that has to happen on every run, so I don't see why
the second run would be faster. This would be more consistent with the
debug-info parsing case, as there we only index the dwarf once (if it
hasn't changed). So, I think we are missing something here.

In any case, this is not a fundamental limitation, and there are ways to
remove that. The most obvious one is to move the rendezvous structure
parsing to the server -- there is even a gdb packet for that, i don't know
its name off-hand. Currently we have support for that in the client (for
communicating with stubs that support it), but not in lldb-server.

For the register reading part, we usually make sure we send the "important"
registers in a batch, so that the client does not have to handle every one
separately. At the moment it's only PC, because that used to be enough at
some point. I don't know if anything changed in the client to make it ask
for more info, but this is something that needs to be looked at more
closely.

> I enabled "gdb-remote all" logging, and searched for all all instances of
'0x00005652ACF3F120' (the address of the connection object that is
reporting the timeout?). Seems to be a pretty good corelation between the
timeouts and 'Communication::SyncronizeWithReadThread', unfortunately I
haven't had time to investigate further.

> 1525110345.916504145 0x5652acefcbb0
Broadcaster("lldb.process")::RestoreBroadcaster (about to pop
listener("lldb.PlatformLinux.DebugProcess.hijack")=0x5652acf079a0)
> 1525110345.916502953 this = 0x00005652ACF3F120, timeout = 5000000 us
> --
> 1525110345.919557333 0x7f0868000940
'Communication::SyncronizeWithReadThread'
Listener::FindNextEventInternal(broadcaster=(nil),
broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
0x7f086c0008c0
> 525110345.919566154 this = 0x00005652ACF3F120, timeout = 5000000 us
> --
> 1525110346.123922110 0x7f0868000d10
'Communication::SyncronizeWithReadThread'
Listener::FindNextEventInternal(broadcaster=(nil),
broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
0x7f086c0008c0
> 1525110346.123931408 this = 0x00005652ACF3F120, timeout = 5000000 us
> --
> 1525110346.152676821 0x7f0868006710
'Communication::SyncronizeWithReadThread'
Listener::FindNextEventInternal(broadcaster=(nil),
broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
0x7f086c0008c0
> 1525110346.152685642 this = 0x00005652ACF3F120, timeout = 5000000 us
> --
> 1525110346.167683363 0x7f08682b2fe0
'Communication::SyncronizeWithReadThread'
Listener::FindNextEventInternal(broadcaster=(nil),
broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
0x7f086c0008c0
> 1525110346.167692184 this = 0x00005652ACF3F120, timeout = 5000000 us
I think the timeout quest is a red haring, tbh. These two are correlated,
because they are both things that we need to do when the process stops
(flush its STDIO after it stops, and start the read thread after we resume
it). The run fast, and there are no timeouts involved.

> --
> 1525110351.172777176 error: timed out, status = timed out, uri =
> 1525110351.172847271 this = 0x00005652ACF3F120, timeout = 5000000 us
> 1525110356.173308611 error: timed out, status = timed out, uri =
> 1525110356.173368216 this = 0x00005652ACF3F120, timeout = 5000000 us
> 1525110361.175591230 error: timed out, status = timed out, uri =
> 1525110361.175647497 this = 0x00005652ACF3F120, timeout = 5000000 us
> 1525110366.180710316 error: timed out, status = timed out, uri =
> 1525110366.180769205 this = 0x00005652ACF3F120, timeout = 5000000 us
And I bet these happen while the process is running, so they do not impact
the latency at all. It's just us waiting for the process to stop in a loop.