[Lldb-commits] [PATCH] D48393: Make DWARFParsing more thread-safe

Thu Jun 21 10:32:15 PDT 2018

Part of gdb's slow startup used to be because it built its indexes manually.  

But also, while gdb does do lazy ingestion of debug information it's laziness is (or was last time I looked at it) on a compile-unit boundary, so when you stop in a file that includes a many complex types you can end up doing a lot of unnecessary work.  

Use to be, if you want to see how bad things might be you could fire up gdb on a big program and do "maint print symtabs".  That will get it to parse all the debug information available.  gdb will go away for quite some time when you do that.  Some of the time is spent in printing, but still a whole lot of time will be spent in DWARF parsing.

Plus, pretty much all the times that I've seen or had reported in bugs where somebody went to run an expression or print the locals and lldb goes away and spins for a minute or so, if you sample it it's spending most of that time ingesting DWARF types because our laziness strategy failed and we ended up realizing a whole raft of types we didn't need.

Keeping this working well (actually better than it does now) is a crucial part of getting the performance of the debugger under control.

Jim

> On Jun 21, 2018, at 7:58 AM, Zachary Turner <zturner at google.com> wrote:
> 
> Performance i get. Gdb is almost unusable for large programs because of how long it takes to load debug info.
> 
> Do you have specific numbers on memory usage? How much memory (absolute and %) is saved by loading debug info lazily on a relatively large project?
> On Thu, Jun 21, 2018 at 7:54 AM Greg Clayton <clayborg at gmail.com> wrote:
> 
> 
>> On Jun 21, 2018, at 7:47 AM, Zachary Turner <zturner at google.com> wrote:
>> 
>> Related question: Is the laziness done to save memory, startup time, or both?
> 
> Both. It allows us to fetch only what we need when we need it. Time to break at main.cpp:123 is much quicker. Using LLDB for symbolication is much quicker as symbolication only needs to know about function definitions and function bounds. Many uses of LLDB are made better by partially parsing.
> 
>> On Thu, Jun 21, 2018 at 7:36 AM Greg Clayton via Phabricator <reviews at reviews.llvm.org> wrote:
>> clayborg added a comment.
>> 
>> In https://reviews.llvm.org/D48393#1138989, @labath wrote:
>> 
>> > I am not sure this will actually solve the problems you are seeing. This may avoid corrupting the internal DenseMap data structures, but it will not make the algorithm using them actually correct.
>> >  For example the pattern in `ParseTypeFromDWARF` is:
>> >
>> > 1. check the "already parsed map". If the DIE is already parsed then you're done.
>> > 2. if the map contains the magic "DIE_IS_BEING_PARSED" key, abort (recursive dwarf references)
>> > 3. otherwise, insert the  "DIE_IS_BEING_PARSED" key into the map
>> > 4. do the parsing, which potentially involves recursive `ParseTypeFromDWARF` calls
>> > 5. insert the parsed type into the map
>> >
>> >   What you do is make each of the steps (1), (3), (5) atomic individually. However, the whole algorithm is not correct unless the whole sequence is atomic as a whole. Otherwise, if you have two threads trying to parse the same DIE (directly or indirectly), one of them could see the intermediate DIE_IS_BEING_PARSED and incorrectly assume that it encountered recursive types.
>> 
>> 
>> We need to make #1 atomic.
>> #2 would need to somehow know if the type is already being parsed recursively by the current thread. If so, then do what we do now. If not, we need a way to wait on the completion of this type so the other parsing thread can complete it and put it into the map, at which time we grab the right value from the map
>> So #6 step would need to be added so after we do put it into the map, we can notify other threads that might be waiting
>> 
>> > So, I think that locking at a higher level would be better. Doing that will certainly be tricky though...
>> 
>> 
>> 
>> 
>> https://reviews.llvm.org/D48393
>> 
>> 
>> 
>