[lldb-dev] Parallelizing loading of shared libraries

Wed Apr 26 17:00:00 PDT 2017

We started out with the philosophy that lldb wouldn't touch any more information in a shared library than we actually needed.  So when a library gets loaded we might need to read in and resolve its section list, but we won't read in any symbols if we don't need to look at them.  The idea was that if you did "load a binary, and run it" until the binary stops for some reason, we haven't done any unnecessary work.  Similarly, if all the breakpoints the user sets are scoped to a shared library then there's no need for us to read any symbols for any other shared libraries.  I think that is a good goal, it allows the debugger to be used in special purpose analysis tools w/o forcing it to pay costs that a more general purpose debug session might require.

I think it would be hard to convert all the usages of modules to from "do something with a shared library" mode to "tell me you are interested in a shared library and give me a callback" so that the module reading could be parallelized on demand.  But at the very least we need to allow a mode where symbol reading is done lazily.

The other concern is that lldb keeps the modules it reads in a global cache, shared by all debuggers & targets.  It is very possible that you could have two targets or two debuggers each with one target that are reading in shared libraries simultaneously, and adding them to the global cache.  In some of the uses that lldb has under Xcode this is actually very common.  So the task pool will have to be built up as things are added to the global shared module cache, not at the level of individual targets noticing the read-in of a shared library.

Jim

> On Apr 26, 2017, at 4:12 PM, Scott Smith via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> 
> After a dealing with a bunch of microoptimizations, I'm back to parallelizing loading of shared modules.  My naive approach was to just create a new thread per shared library.  I have a feeling some users may not like that; I think I read an email from someone who has thousands of shared libraries.  That's a lot of threads :-)
> 
> The problem is loading a shared library can cause downstream parallelization through TaskPool.  I can't then also have the loading of a shared library itself go through TaskPool, as that could cause a deadlock - if all the worker threads are waiting on work that TaskPool needs to run on a worker thread.... then nothing will happen.
> 
> Three possible solutions:
> 
> 1. Remove the notion of a single global TaskPool, but instead have a static pool at each callsite that wants it.  That way multiple paths into the same code would share the same pool, but different places in the code would have their own pool.
> 
> 2. Change the wait code for TaskRunner to note whether it is already on a TaskPool thread, and if so, spawn another one.  However, I don't think that fully solves the issue of having too many threads loading shared libraries, as there is no guarantee the new worker would work on the "deepest" work.  I suppose each task would be annotated with depth, and the work could be sorted in TaskPool though...
> 
> 3. Leave a separate thread per shared library.
> 
> Thoughts?
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev