[PATCH] D18811: Fix a race condition in support library ThreadPool

Tue Apr 5 18:20:21 PDT 2016

jlebar added a comment.

Apologies for scope creep here.  If you want to save the bigger comments for another patch, that's fine.  But I think most of these need to be addressed one way or another (not necessarily by you, I suppose).

The atomic ActiveThreads niggles me; it may be worth cc'ing the original author here to make sure it's not a Chesterton's Fence.  I think this code is much more correct, though.  :)


================
Comment at: include/llvm/Support/ThreadPool.h:130
@@ -132,3 +129,3 @@
   bool EnableFlag;
 #endif
 };
----------------
Maybe worth adding comments here indicating which variables are guarded by the mutex, so we are less likely to have the bug we previously had?  I'd also move the definition of Mutex and the condvars above whatever they protect.

It would also be nice to have a testcase which illustrates the bug (even if only under tsan), but I don't know how hard that would be.

================
Comment at: lib/Support/ThreadPool.cpp:39
@@ -40,1 +38,3 @@
+          SubmissionCondition.wait(
+              LockGuard, [&] { return !EnableFlag || !Tasks.empty(); });
           // Exit condition
----------------
Hmm, this means that if there are tasks in the queue when we run the destructor, we skip them.  That is, we only block until we complete the currently-running tasks.

At least this should be documented in the destructor comment; it's not clear to me from that comment that this is our behavior.  But I'm also not sure it's desirable, as can leave us with futures that will never be completed.  Specifically, if you have threads waiting on futures, and then you destroy the threadpool without calling wait() on the threadpool first, those waiting threads will never wake up.

Since the destructor has to wait for running threads to complete *anyway*, clearing out the whole task queue seems pretty reasonable to me.

================
Comment at: lib/Support/ThreadPool.cpp:47
@@ -46,2 +46,3 @@
           // in order for wait() to properly detect that even if the queue is
           // empty, there is still a task in flight.
+          ++ActiveThreads;
----------------
The order is no longer relevant, as everything is guarded by a single mutex.  I think we can nix this whole comment?

================
Comment at: lib/Support/ThreadPool.cpp:60
@@ -62,2 +59,3 @@
         {
           // Adjust `ActiveThreads`, in case someone waits on ThreadPool::wait()
+          std::unique_lock<std::mutex> LockGuard(Mutex);
----------------
Not sure this comment is necessary; seems pretty clear what's going on.

================
Comment at: lib/Support/ThreadPool.cpp:88
@@ -90,3 +87,3 @@
     // Don't allow enqueueing after disabling the pool
     assert(EnableFlag && "Queuing a thread during ThreadPool destruction");
 
----------------
This is interesting, as it means that you're not allowed to enqueue new tasks from a task T, if it's possible that T is still running when we destroy the threadpool.  Seems like it's worth a comment at least.  I think the right behavior probably is to explicitly not allow new tasks after we've begun shutting down -- maybe we return a null future?  Can we do that?

Threadpool shutdown is always so complicated...

================
Comment at: lib/Support/ThreadPool.cpp:104
@@ -106,3 +103,3 @@
   for (auto &Worker : Threads)
     Worker.join();
 }
----------------
May be worth #ifndef NDEBUG asserting here that ActiveThreads is 0.


http://reviews.llvm.org/D18811