[lldb-dev] Parallelizing loading of shared libraries

Sun Apr 30 21:41:06 PDT 2017

The overall concept is similar; it comes down to implementation details like
1. llvm doesn't have a global pool, it's probably instantiated on demand
2. llvm keeps threads around until the pool is destroyed, rather than
letting the threads exit when they have nothing to do
3. llvm starts up all the threads immediately, rather than on demand.

Overall I like the current lldb version better than the llvm version, but I
haven't examined any of the use cases of the llvm version to know whether
it could be dropped in without issue.  However, neither does what I want,
so I'll move forward prototyping what I think it should do, and then see
how applicable it is to llvm.

On Sun, Apr 30, 2017 at 9:02 PM, Zachary Turner <zturner at google.com> wrote:

> Have we examined llvm::ThreadPool to see if it can work for our needs?
> And if not, what kind of changes would be needed to llvm::ThreadPool to
> make it suitable?
>
> On Fri, Apr 28, 2017 at 8:04 AM Scott Smith via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
>
>> Hmmm ok, I don't like hard coding pools.  Your idea about limiting the
>> number of high level threads gave me an idea:
>>
>> 1. System has one high level TaskPool.
>> 2. TaskPools have up to one child and one parent (the parent for the high
>> level TaskPool = nullptr).
>> 3. When a worker starts up for a given TaskPool, it ensures a single
>> child exists.
>> 4. There is a thread local variable that indicates which TaskPool that
>> thread enqueues into (via AddTask).  If that variable is nullptr, then it
>> is the high level TaskPool.Threads that are not workers enqueue into this
>> TaskPool.  If the thread is a worker thread, then the variable points to
>> the worker's child.
>> 5. When creating a thread in a TaskPool, it's thread count AND the thread
>> count of the parent, grandparent, etc are incremented.
>> 6. In the main worker loop, if there is no more work to do, OR the thread
>> count is too high, the worker "promotes" itself.  Promotion means:
>> a. decrement the thread count for the current task pool
>> b. if there is no parent, exit; otherwise, become a worker for the parent
>> task pool (and update the thread local TaskPool enqueue pointer).
>>
>> The main points are:
>> 1. We don't hard code the number of task pools; the code automatically
>> uses the fewest number of taskpools needed regardless of the number of
>> places in the code that want task pools.
>> 2. When the child taskpools are busy, parent taskpools reduce their
>> number of workers over time to reduce oversubscription.
>>
>> You can fiddle with the # of allowed threads per level; for example, if
>> you take into account number the height of the pool, and the number of
>> child threads, then you could allocate each level 1/2 of the number of
>> threads as the level below it, unless the level below wasn't using all the
>> threads; then the steady state would be 2 * cores, rather than height *
>> cores.  I think that it probably overkill though.
>>
>>
>> On Fri, Apr 28, 2017 at 4:37 AM, Pavel Labath <labath at google.com> wrote:
>>
>>> On 27 April 2017 at 00:12, Scott Smith via lldb-dev
>>> <lldb-dev at lists.llvm.org> wrote:
>>> > After a dealing with a bunch of microoptimizations, I'm back to
>>> > parallelizing loading of shared modules.  My naive approach was to just
>>> > create a new thread per shared library.  I have a feeling some users
>>> may not
>>> > like that; I think I read an email from someone who has thousands of
>>> shared
>>> > libraries.  That's a lot of threads :-)
>>> >
>>> > The problem is loading a shared library can cause downstream
>>> parallelization
>>> > through TaskPool.  I can't then also have the loading of a shared
>>> library
>>> > itself go through TaskPool, as that could cause a deadlock - if all the
>>> > worker threads are waiting on work that TaskPool needs to run on a
>>> worker
>>> > thread.... then nothing will happen.
>>> >
>>> > Three possible solutions:
>>> >
>>> > 1. Remove the notion of a single global TaskPool, but instead have a
>>> static
>>> > pool at each callsite that wants it.  That way multiple paths into the
>>> same
>>> > code would share the same pool, but different places in the code would
>>> have
>>> > their own pool.
>>> >
>>>
>>> I looked at this option in the past and this was my preferred
>>> solution. My suggestion would be to have two task pools. One for
>>> low-level parallelism, which spawns
>>> std::thread::hardware_concurrency() threads, and another one for
>>> higher level tasks, which can only spawn a smaller number of threads
>>> (the algorithm for the exact number TBD). The high-level threads can
>>> access to low-level ones, but not the other way around, which
>>> guarantees progress.
>>>
>>> I propose to hardcode 2 pools, as I don't want to make it easy for
>>> people to create additional ones -- I think we should be having this
>>> discussion every time someone tries to add one, and have a very good
>>> justification for it (FWIW, I think your justification is good in this
>>> case, and I am grateful that you are pursuing this).
>>>
>>> pl
>>>
>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20170430/cf902d04/attachment.html>