[lldb-dev] Parallelizing loading of shared libraries

Mon May 1 14:42:13 PDT 2017

On 28 April 2017 at 16:04, Scott Smith <scott.smith at purestorage.com> wrote:
> Hmmm ok, I don't like hard coding pools.  Your idea about limiting the
> number of high level threads gave me an idea:
>
> 1. System has one high level TaskPool.
> 2. TaskPools have up to one child and one parent (the parent for the high
> level TaskPool = nullptr).
> 3. When a worker starts up for a given TaskPool, it ensures a single child
> exists.
> 4. There is a thread local variable that indicates which TaskPool that
> thread enqueues into (via AddTask).  If that variable is nullptr, then it is
> the high level TaskPool.Threads that are not workers enqueue into this
> TaskPool.  If the thread is a worker thread, then the variable points to the
> worker's child.
> 5. When creating a thread in a TaskPool, it's thread count AND the thread
> count of the parent, grandparent, etc are incremented.
> 6. In the main worker loop, if there is no more work to do, OR the thread
> count is too high, the worker "promotes" itself.  Promotion means:
> a. decrement the thread count for the current task pool
> b. if there is no parent, exit; otherwise, become a worker for the parent
> task pool (and update the thread local TaskPool enqueue pointer).
>
> The main points are:
> 1. We don't hard code the number of task pools; the code automatically uses
> the fewest number of taskpools needed regardless of the number of places in
> the code that want task pools.
> 2. When the child taskpools are busy, parent taskpools reduce their number
> of workers over time to reduce oversubscription.

The algorithm sounds reasonable to me. I'm just not sold on the
"automatic" part. My feeling is that if you cannot tell statically
what "depth" in the pool your code runs in, there is something wrong
with your code and you should fix that first.

Besides, hardcoding the nesting logic into "add" is kinda wrong.
Adding a task is not the problematic operation, waiting for the result
of one is. Granted, generally these happen on the same thread, but
they don't have to be -- you can write a continuation-style
computation, where you do a bit of work, and then enqueue a task to do
the rest. This would create an infinite pool depth here.

Btw, are we sure it's not possible to solve this with just one thread
pool. What would happen if we changed the implementation of "wait" so
that if the target task is not scheduled yet, we just go ahead an
compute it on our thread? I haven't thought through all the details,
but is sounds like this could actually give better performance in some
scenarios...