[lldb-dev] Breakpoint + callback performance ... Can it be faster?

Tue Aug 16 11:06:58 PDT 2016

> On Aug 16, 2016, at 10:42 AM, Benjamin Dicken <bddicken at datawareventures.com> wrote:
> 
> Thanks for the quick reply.
> 
> > Are you sure the actual handling of the breakpoint & callback in lldb is what is taking most of the time?
> 
> I'm not positive. I did collect some callgrind profiles to take a look at where most of the time is being spent, but i'm not very familiar with lldb internals so the results were hard to interpret. I did notice that there was a lot of packet/network business when using lldb to profile a program (which I assumed was communication between my program and lldb-server). I was not sure how this effected the performance, so perhaps this is the real bottleneck.

I would be pretty surprised if it was not.  We had some bugs in breakpoint handling - mostly related to having very very many breakpoints.  But other than that the dispatching of the breakpoint StopInfo is a pretty simple, straight forward bit of work.

> 
> > Greg just switched to using a unix-domain socket for this communication for platforms that support it.  This speeds up the packet traffic side of things.
> 
> In what version of lldb was this introduced? I'm running 3.7.1. I'm also on ubuntu 14.04, is that a supported platform?

It is just in TOT lldb, he just added it last week.  It is currently only turned on for OS X.

> 
> > One of the original motivations of having lldb-server be based on lldb classes - as opposed to the MacOS X version of debugserver which is an independent construct - was that you could re-use the server code to create an in-process Process plugin, eliminating a lot of this traffic & context switching when you needed maximum speed.
> 
> That sounds very interesting. Is there an example of this implementation you could point me to?
> 

FreeBSB & Windows still have native Process plugins.  But they aren't used for the lldb-server implementation so far as I can tell (I've mostly worked on the OS X side.)  I think this was more of a design intent that hasn't actually been used anywhere yet.  But the Linux/Android folks will know better.

Jim

> 
> 
> On Tue, Aug 16, 2016 at 10:20 AM, Jim Ingham <jingham at apple.com> wrote:
> Are you sure the actual handling of the breakpoint & callback in lldb is what is taking most of the time?  The last time we looked at this, the majority of the work was in communicating with debugserver to get the stop notification and restart.  Note, besides all the packet code, this involves context switches from process->lldbserver->lldb and back, which is also pretty expensive.
> 
> Greg just switched to using a unix-domain socket for this communication for platforms that support it.  This speeds up the packet traffic side of things.
> 
> One of the original motivations of having lldb-server be based on lldb classes - as opposed to the MacOS X version of debugserver which is an independent construct - was that you could re-use the server code to create an in-process Process plugin, eliminating a lot of this traffic & context switching when you needed maximum speed.  The original Mac OS X lldb port actually had a process plugin wholly in-process with lldb as well as the debugserver based one, but there wasn't enough motivation to justify maintaining the two different implementations of the same code.  I don't know whether the Linux port takes advantage of this possibility, however.  That would be something to look into, however.
> 
> Once we actually figure out about the stop, figuring out the breakpoint and getting to its callback is pretty simple...  I doubt making "lighter weight breakpoints" in particular will recover the performance you need, though if your sampling turns up some inefficient algorithms have crept in, it would be great to fix that.
> 
> Another option we've toyed with on and off is something like the gdb "tracepoints" were you can upload instructions to perform "experiments" when a breakpoint is hit to the lldb-server instance.  The work to perform the experiment and the results would all be kept in the lldb-server instance till a real breakpoint is hit, at which point lldb can download all the results and present them to the user.  This would eliminate some of the context-switches and packet traffic while you were running in the hot parts of your code.  This is a decent chunk of work, however.
> 
> Jim
> 
> 
> > On Aug 16, 2016, at 9:57 AM, Benjamin Dicken via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >
> > I recently started using lldb to write a basic instrumentation tool for tracking the values of variables at various code-points in a program. I've been working with lldb for less than two weeks, so I am pretty new. Though, I have used and written llvm passes in the past, so I'm familiar with the clang/llvm/lldb ecosystem.
> >
> > I have a very early prototype of the tool up and running, using the C++ API. The user can specify either an executable to run or an already-running PID to attach to. The user also supplies a file+line_number at which a breakpoint (with a callback) is placed. For testing/prototyping purposes, the breakpoint callback just increments a counter and then immediately returns false. Eventually, more interesting things will happen in this callback.
> >
> > I've noticed that just the action of hitting a breakpoint and invoking the callback is very expensive. I did some instruction-count collection by running this lldb tool on a simple test program, and placing the breakpoint+callback at different points in the program, causing it to get triggered different amounts of times. I used `perf stat -e instructions ...` to gather instruction exec counts for each run. After doing a little math, it appears that I'm incurring 1.0 - 1.1 million instruction execs per breakpoint.
> >
> > This amount of slowdown is prohibitively expensive for my needs, because I want to place callbacks in hot portions of the "inferior" program.
> >
> > Is there a way to make this faster? Is it possible to create "lighter-weight" breakpoints? I really like the lldb API (though the documentation is lacking in some places), but if this performance hit can't be mitigated, it may be unusable for me.
> >
> > For reference, this is the callback function:
> >
> > ```
> > static int cb_count = 0;
> > bool SimpleCallback (
> >     void *baton,
> >     lldb::SBProcess &process,
> >     lldb::SBThread &thread,
> >     lldb::SBBreakpointLocation &location) {
> >   //TODO: Eventually do more interesting things...
> >   cb_count++;
> >   return false;
> > }
> > ```
> >
> > And here is how I set it up to be called back:
> >
> > ```
> > lldb::SBBreakpoint bp1 = debugger_data->target.BreakpointCreateByLocation(file_name, line_no);
> > if (!bp1.IsValid()) std::cerr << "invalid breakpoint";
> > bp1.SetCallback(SimpleCallback, 0);
> > ```
> >
> > -Benjamin
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 
> 
> 
> 
> -- 
> Ben