[lldb-dev] LLDB performance drop from 3.9 to 4.0

Thu Apr 20 11:01:40 PDT 2017

What's the preferred way to post changes?  In the past I tried emailing
here but it was pointed out I should send to lldb-commit instead.  But,
there's also phabricator for web-based code reviews.

So,

1. just email lldb-commits?
2. post on http://reviews.llvm.org/?

On Thu, Apr 20, 2017 at 3:16 AM, Pavel Labath <labath at google.com> wrote:

> Thank you very much for tracking this down.
>
> +1 for making UniqueCStringMap speak ConstString -- i think it just makes
> sense given that it already has "unique" in the name.
>
> ConstString already has a GetStringRef accessor. Also adding a conversion
> operator may be a good idea, although it probably won't help in all
> situations (you'll still have to write StringRef(X).drop_front() etc. if
> you want to do stringref operations on the string)
>
> pl
>
> On 20 April 2017 at 01:46, Zachary Turner <zturner at google.com> wrote:
>
>> It doesn't entirely defeat the purpose, it's just not *as good* as making
>> the interfaces take ConstStrings.  StringRef already has a lot of safety
>> and usability improvements over raw char pointers, and those improvements
>> don't disappear just because you aren't using ConstString.  Although I
>> agree that if you can make it work where the interface only accepts and
>> returns ConstStrings, and make conversion from ConstString to StringRef
>> more seamless, that would be an even better improvement.
>>
>> On Wed, Apr 19, 2017 at 5:33 PM Scott Smith <scott.smith at purestorage.com>
>> wrote:
>>
>>> If I just assume the pointers are from ConstString, then doesn't that
>>> defeat the purpose of making the interface safer?  Why not use an actual
>>> ConstString and provide conversion operators from ConstString to
>>> StringRef?  Seems we should be able to rely on the type system to get us
>>> safety and performance.
>>>
>>> I'll try putting something together tomorrow.
>>>
>>> On Wed, Apr 19, 2017 at 4:48 PM, Zachary Turner <zturner at google.com>
>>> wrote:
>>>
>>>> The change was made to make the interface safer and allow propagation
>>>> of StringRef through other layers.  The previous code was already taking a
>>>> const char *, and so it was working under the assumption that the const
>>>> char* passed in came from a ConstString.  As such, continuing to make that
>>>> same assumption seems completely reasonable.
>>>>
>>>> So perhaps you can just change the operator to compare the pointers, as
>>>> was being done before.
>>>>
>>>> On Wed, Apr 19, 2017 at 4:24 PM Scott Smith <
>>>> scott.smith at purestorage.com> wrote:
>>>>
>>>>> It looks like it was this change:
>>>>>
>>>>> commit 45fb8d00309586c3f7027f66f9f8a0b56bf1cc4a
>>>>> Author: Zachary Turner <zturner at google.com>
>>>>> Date:   Thu Oct 6 21:22:44 2016 +0000
>>>>>
>>>>>     Convert UniqueCStringMap to use StringRef.
>>>>>
>>>>>     git-svn-id: https://llvm.org/svn/llvm-project/lldb/trunk@283494
>>>>> 91177308-0d34-0410-b5e6-96231b3b80d8
>>>>>
>>>>>
>>>>> I'm guessing it's because the old code assumed const string, which
>>>>> meant that uniqueness comparisons could be done by simply comparing the
>>>>> pointer.  Now it needs to use an actual string comparison routine.  This
>>>>> code:
>>>>>
>>>>>      bool operator<(const Entry &rhs) const { return cstring <
>>>>> rhs.cstring; }
>>>>>
>>>>> didn't actually change in the revision, but cstring went from 'const
>>>>> char *' to 'StringRef'.  If you know for sure that all the StringRefs come
>>>>> from ConstString, then it'd be easy enough to change the comparison, but I
>>>>> don't know how you guarantee that.
>>>>>
>>>>> I assume the change was made to allow proper memory cleanup when the
>>>>> symbols are discarded?
>>>>>
>>>>> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath <labath at google.com>
>>>>> wrote:
>>>>>
>>>>>> Bisecting the performance regression would be extremely valuable. If
>>>>>> you want to do that, it would be very appreciated.
>>>>>>
>>>>>> On 12 April 2017 at 20:39, Scott Smith via lldb-dev <
>>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> For my app I think it's largely parsing debug symbols tables for
>>>>>>> shared libraries.  My main performance improvement was to increase the
>>>>>>> parallelism of parsing that information.
>>>>>>>
>>>>>>> Funny, gdb/gold has a similar accelerator table (created when you
>>>>>>> link with -gdb-index).  I assume lldb doesn't know how to parse it.
>>>>>>>
>>>>>>> I'll work on bisecting the change.
>>>>>>>
>>>>>>> On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda <jason at molenda.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I don't know exactly when the 3.9 / 4.0 branches were cut, and what
>>>>>>>> was done between those two points, but in general we don't expect/want to
>>>>>>>> see performance regressions like that.  I'm more familiar with the perf
>>>>>>>> characteristics on macos, Linux is different in some important regards, so
>>>>>>>> I can only speak in general terms here.
>>>>>>>>
>>>>>>>> In your example, you're measuring three things, assuming you have
>>>>>>>> debug information for MY_PROGRAM.  The first is "Do the initial read of the
>>>>>>>> main binary and its debug information".  The second is "Find all symbol
>>>>>>>> names 'main'".  The third is "Scan a newly loaded solib's symbols"
>>>>>>>> (assuming you don't have debug information from solibs from /usr/lib etc).
>>>>>>>> Technically there's some additional stuff here -- launching the process,
>>>>>>>> detecting solibs as they're loaded, looking up the symbol context when we
>>>>>>>> hit the breakpoint, backtracing a frame or two, etc, but that stuff is
>>>>>>>> rarely where you'll see perf issues on a local debug session.
>>>>>>>>
>>>>>>>> Which of these is likely to be important will depend on your
>>>>>>>> MY_PROGRAM.  If you have a 'int main(){}', it's not going to be dwarf
>>>>>>>> parsing.  If your binary only pulls in three solib's by the time it is
>>>>>>>> running, it's not going to be new module scanning. A popular place to spend
>>>>>>>> startup time is in C++ name demangling if you have a lot of solibs with C++
>>>>>>>> symbols.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Darwin systems, we have a nonstandard accelerator table in our
>>>>>>>> DWARF emitted by clang that lldb reads.  The "apple_types", "apple_names"
>>>>>>>> etc tables.  So when we need to find a symbol named "main", for Modules
>>>>>>>> that have a SymbolFile, we can look in the accelerator table.  If that
>>>>>>>> SymbolFile has a 'main', the accelerator table gives us a reference into
>>>>>>>> the DWARF for the definition, and we can consume the DWARF lazily.  We
>>>>>>>> should never need to do a full scan over the DWARF, that's considered a
>>>>>>>> failure.
>>>>>>>>
>>>>>>>> (in fact, I'm working on a branch of the llvm.org sources from
>>>>>>>> mid-October and I suspect Darwin lldb is often consuming a LOT more dwarf
>>>>>>>> than it should be when I'm debugging, I need to figure out what is causing
>>>>>>>> that, it's a big problem.)
>>>>>>>>
>>>>>>>>
>>>>>>>> In general, I've been wanting to add a new "perf counters"
>>>>>>>> infrastructure & testsuite to lldb, but haven't had time.  One thing I work
>>>>>>>> on a lot is debugging over a bluetooth connection; it turns out that BT is
>>>>>>>> very slow, and any extra packets we send between lldb and debugserver are
>>>>>>>> very costly.  The communication is so fast over a local host, or over a usb
>>>>>>>> cable, that it's easy for regressions to sneak in without anyone noticing.
>>>>>>>> So the original idea was hey, we can have something that counts packets for
>>>>>>>> distinct operations.  Like, this "next" command should take no more than 40
>>>>>>>> packets, that kind of thing.  And it could be expanded -- "b main should
>>>>>>>> fully parse the DWARF for only 1 symbol", or "p *this should only look up 5
>>>>>>>> types", etc.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> > On Apr 12, 2017, at 11:26 AM, Scott Smith via lldb-dev <
>>>>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>>>>> >
>>>>>>>> > I worked on some performance improvements for lldb 3.9, and was
>>>>>>>> about to forward port them so I can submit them for inclusion, but I
>>>>>>>> realized there has been a major performance drop from 3.9 to 4.0.  I am
>>>>>>>> using the official builds on an Ubuntu 16.04 machine with 16 cores / 32
>>>>>>>> hyperthreads.
>>>>>>>> >
>>>>>>>> > Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM >
>>>>>>>> /dev/null
>>>>>>>> >
>>>>>>>> > With 3.9, I get:
>>>>>>>> > real    0m31.782s
>>>>>>>> > user    0m50.024s
>>>>>>>> > sys    0m4.348s
>>>>>>>> >
>>>>>>>> > With 4.0, I get:
>>>>>>>> > real    0m51.652s
>>>>>>>> > user    1m19.780s
>>>>>>>> > sys    0m10.388s
>>>>>>>> >
>>>>>>>> > (with my changes + 3.9, I got real down to 4.8 seconds!  But I'm
>>>>>>>> not convinced you'll like all the changes.)
>>>>>>>> >
>>>>>>>> > Is this expected?  I get roughly the same results when compiling
>>>>>>>> llvm+lldb from source.
>>>>>>>> >
>>>>>>>> > I guess I can spend some time trying to bisect what happened.
>>>>>>>> 5.0 looks to be another 8% slower.
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > lldb-dev mailing list
>>>>>>>> > lldb-dev at lists.llvm.org
>>>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> lldb-dev mailing list
>>>>>>> lldb-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170420/c07c1b9a/attachment-0001.html>