[lldb-dev] LLDB performance drop from 3.9 to 4.0

Thu Apr 13 05:37:02 PDT 2017

Bisecting the performance regression would be extremely valuable. If you
want to do that, it would be very appreciated.

On 12 April 2017 at 20:39, Scott Smith via lldb-dev <lldb-dev at lists.llvm.org
> wrote:

> For my app I think it's largely parsing debug symbols tables for shared
> libraries.  My main performance improvement was to increase the parallelism
> of parsing that information.
>
> Funny, gdb/gold has a similar accelerator table (created when you link
> with -gdb-index).  I assume lldb doesn't know how to parse it.
>
> I'll work on bisecting the change.
>
> On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda <jason at molenda.com> wrote:
>
>> I don't know exactly when the 3.9 / 4.0 branches were cut, and what was
>> done between those two points, but in general we don't expect/want to see
>> performance regressions like that.  I'm more familiar with the perf
>> characteristics on macos, Linux is different in some important regards, so
>> I can only speak in general terms here.
>>
>> In your example, you're measuring three things, assuming you have debug
>> information for MY_PROGRAM.  The first is "Do the initial read of the main
>> binary and its debug information".  The second is "Find all symbol names
>> 'main'".  The third is "Scan a newly loaded solib's symbols" (assuming you
>> don't have debug information from solibs from /usr/lib etc).  Technically
>> there's some additional stuff here -- launching the process, detecting
>> solibs as they're loaded, looking up the symbol context when we hit the
>> breakpoint, backtracing a frame or two, etc, but that stuff is rarely where
>> you'll see perf issues on a local debug session.
>>
>> Which of these is likely to be important will depend on your MY_PROGRAM.
>> If you have a 'int main(){}', it's not going to be dwarf parsing.  If your
>> binary only pulls in three solib's by the time it is running, it's not
>> going to be new module scanning. A popular place to spend startup time is
>> in C++ name demangling if you have a lot of solibs with C++ symbols.
>>
>>
>> On Darwin systems, we have a nonstandard accelerator table in our DWARF
>> emitted by clang that lldb reads.  The "apple_types", "apple_names" etc
>> tables.  So when we need to find a symbol named "main", for Modules that
>> have a SymbolFile, we can look in the accelerator table.  If that
>> SymbolFile has a 'main', the accelerator table gives us a reference into
>> the DWARF for the definition, and we can consume the DWARF lazily.  We
>> should never need to do a full scan over the DWARF, that's considered a
>> failure.
>>
>> (in fact, I'm working on a branch of the llvm.org sources from
>> mid-October and I suspect Darwin lldb is often consuming a LOT more dwarf
>> than it should be when I'm debugging, I need to figure out what is causing
>> that, it's a big problem.)
>>
>>
>> In general, I've been wanting to add a new "perf counters" infrastructure
>> & testsuite to lldb, but haven't had time.  One thing I work on a lot is
>> debugging over a bluetooth connection; it turns out that BT is very slow,
>> and any extra packets we send between lldb and debugserver are very
>> costly.  The communication is so fast over a local host, or over a usb
>> cable, that it's easy for regressions to sneak in without anyone noticing.
>> So the original idea was hey, we can have something that counts packets for
>> distinct operations.  Like, this "next" command should take no more than 40
>> packets, that kind of thing.  And it could be expanded -- "b main should
>> fully parse the DWARF for only 1 symbol", or "p *this should only look up 5
>> types", etc.
>>
>>
>>
>>
>> > On Apr 12, 2017, at 11:26 AM, Scott Smith via lldb-dev <
>> lldb-dev at lists.llvm.org> wrote:
>> >
>> > I worked on some performance improvements for lldb 3.9, and was about
>> to forward port them so I can submit them for inclusion, but I realized
>> there has been a major performance drop from 3.9 to 4.0.  I am using the
>> official builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads.
>> >
>> > Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM > /dev/null
>> >
>> > With 3.9, I get:
>> > real    0m31.782s
>> > user    0m50.024s
>> > sys    0m4.348s
>> >
>> > With 4.0, I get:
>> > real    0m51.652s
>> > user    1m19.780s
>> > sys    0m10.388s
>> >
>> > (with my changes + 3.9, I got real down to 4.8 seconds!  But I'm not
>> convinced you'll like all the changes.)
>> >
>> > Is this expected?  I get roughly the same results when compiling
>> llvm+lldb from source.
>> >
>> > I guess I can spend some time trying to bisect what happened.  5.0
>> looks to be another 8% slower.
>> >
>> > _______________________________________________
>> > lldb-dev mailing list
>> > lldb-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>>
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170413/07e83921/attachment-0001.html>