[lldb-dev] Dlopen extremely slow while LLDB is attached

Tue May 1 06:37:06 PDT 2018

 > This is a known issue, though I did not expect it to have that much of an
> impact. In fact, I have trouble reconciling this fact with your earlier
> statement that second and subsequent runs are much faster. The SO entry
> loading is something that has to happen on every run, so I don't see why
> the second run would be faster. This would be more consistent with the
> debug-info parsing case, as there we only index the dwarf once (if it
> hasn't changed). So, I think we are missing something here.

My statement that subsequent runs are much faster was for my simplified
example (only running locally) that was meant to reproduce the problem I
was seeing when initializing Vulkan. When I went back and tested
initializing Vulkan (remotely) with  SymbolFileDWARF::Index() commented out it
was still very slow, and I realized I was on the wrong path.

My latency to the server is normally only a few ms, which is why I
initially ignored that difference between the two test environments. But it
turns out a few ms is enough to make it twice as slow as running locally.
This is what initially kicked off my investigation. I didn't catch this
earlier because I was previously using GDB locally, and recently switched
to using LLDB remotely. I hadn't compared LLDB local vs LLDB remote. My
latest example to a server with a RTT of 18ms was to confirm that latency
was the contributing factor.

On Tue, May 1, 2018 at 6:44 AM Pavel Labath <labath at google.com> wrote:

> On Mon, 30 Apr 2018 at 20:13, Scott Funkenhauser <sfunkenhauser at google.com
> >
> wrote:
>
> > I messed up my test setup and incorrectly attributed the additional time.
> Parsing the DWARF sections does add additional time, but only increases by
> a small percentage.
>
> > By pure chance I noticed that the latency between the client and server
> had a huge impact. I did some tests against lldb_server running on a
> machine with a RTT of 18ms to the client. The difference in load time in my
> real world example (initializing Vulkan) was around 16x slower (~3.5s vs
> 55s). I did some more digging and it looks like there are certain
> operations that perform a number of synchronous requests to the server
> (DYLDRendezvous::TakeSnapshot - updating SO entries and
> ThreadList::WillResume - grabbing SP and FP for every thread). Since all
> the requests are synchronous they are extremely sensitive to increased
> latency.
>
> > Is this a limitation of the gdb-server (can't handle parallel requests)?
> > Or is this not a common use case, and is not a known issue?
>
> This is a known issue, though I did not expect it to have that much of an
> impact. In fact, I have trouble reconciling this fact with your earlier
> statement that second and subsequent runs are much faster. The SO entry
> loading is something that has to happen on every run, so I don't see why
> the second run would be faster. This would be more consistent with the
> debug-info parsing case, as there we only index the dwarf once (if it
> hasn't changed). So, I think we are missing something here.
>
> In any case, this is not a fundamental limitation, and there are ways to
> remove that. The most obvious one is to move the rendezvous structure
> parsing to the server -- there is even a gdb packet for that, i don't know
> its name off-hand. Currently we have support for that in the client (for
> communicating with stubs that support it), but not in lldb-server.
>
> For the register reading part, we usually make sure we send the "important"
> registers in a batch, so that the client does not have to handle every one
> separately. At the moment it's only PC, because that used to be enough at
> some point. I don't know if anything changed in the client to make it ask
> for more info, but this is something that needs to be looked at more
> closely.
>
> > I enabled "gdb-remote all" logging, and searched for all all instances of
> '0x00005652ACF3F120' (the address of the connection object that is
> reporting the timeout?). Seems to be a pretty good corelation between the
> timeouts and 'Communication::SyncronizeWithReadThread', unfortunately I
> haven't had time to investigate further.
>
> > 1525110345.916504145 0x5652acefcbb0
> Broadcaster("lldb.process")::RestoreBroadcaster (about to pop
> listener("lldb.PlatformLinux.DebugProcess.hijack")=0x5652acf079a0)
> > 1525110345.916502953 this = 0x00005652ACF3F120, timeout = 5000000 us
> > --
> > 1525110345.919557333 0x7f0868000940
> 'Communication::SyncronizeWithReadThread'
> Listener::FindNextEventInternal(broadcaster=(nil),
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
> 0x7f086c0008c0
> > 525110345.919566154 this = 0x00005652ACF3F120, timeout = 5000000 us
> > --
> > 1525110346.123922110 0x7f0868000d10
> 'Communication::SyncronizeWithReadThread'
> Listener::FindNextEventInternal(broadcaster=(nil),
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
> 0x7f086c0008c0
> > 1525110346.123931408 this = 0x00005652ACF3F120, timeout = 5000000 us
> > --
> > 1525110346.152676821 0x7f0868006710
> 'Communication::SyncronizeWithReadThread'
> Listener::FindNextEventInternal(broadcaster=(nil),
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
> 0x7f086c0008c0
> > 1525110346.152685642 this = 0x00005652ACF3F120, timeout = 5000000 us
> > --
> > 1525110346.167683363 0x7f08682b2fe0
> 'Communication::SyncronizeWithReadThread'
> Listener::FindNextEventInternal(broadcaster=(nil),
> broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event
> 0x7f086c0008c0
> > 1525110346.167692184 this = 0x00005652ACF3F120, timeout = 5000000 us
> I think the timeout quest is a red haring, tbh. These two are correlated,
> because they are both things that we need to do when the process stops
> (flush its STDIO after it stops, and start the read thread after we resume
> it). The run fast, and there are no timeouts involved.
>
> > --
> > 1525110351.172777176 error: timed out, status = timed out, uri =
> > 1525110351.172847271 this = 0x00005652ACF3F120, timeout = 5000000 us
> > 1525110356.173308611 error: timed out, status = timed out, uri =
> > 1525110356.173368216 this = 0x00005652ACF3F120, timeout = 5000000 us
> > 1525110361.175591230 error: timed out, status = timed out, uri =
> > 1525110361.175647497 this = 0x00005652ACF3F120, timeout = 5000000 us
> > 1525110366.180710316 error: timed out, status = timed out, uri =
> > 1525110366.180769205 this = 0x00005652ACF3F120, timeout = 5000000 us
> And I bet these happen while the process is running, so they do not impact
> the latency at all. It's just us waiting for the process to stop in a loop.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20180501/d81ef77b/attachment.html>