[libcxx-commits] [PATCH] D105758: Hold mutex lock while notify_all is called at notify_all_at_thread_exit

Fri Aug 27 20:36:51 PDT 2021

lewissbaker added inline comments.

================
Comment at: libcxx/src/thread.cpp:162
         i->first->notify_all();
+        i->second->unlock();
     }
----------------
Quuxplusone wrote:
> Btw, update: I've emailed @lewissbaker to ask why the proof-of-concept wandbox from https://cplusplus.github.io/LWG/issue3343 (namely, https://wandbox.org/permlink/eUu3eiQbLl7JQKMm ) doesn't reproduce anymore. It's trivial to turn that code snippet into a libc++ unit test; but since it's already green on trunk, it wouldn't be very useful.
> He hasn't gotten back to me yet. But this PR is still on my radar!
One approach you could take to try to verify that this fix works is to try putting a sleep between the `unlock()` and `notify_all()` calls in both the before and after code to try to increase the chances of the other thread acquires the lock, sees the state that says the thread is done and then destroys the CV before the exiting thread calls `notify_all()`.

It's obviously not something you'd leave in the code, however, but could serve to give a higher confidence in the fix.

================
Comment at: libcxx/test/std/thread/thread.condition/notify_all_at_thread_exit_lwg3343.pass.cpp:62
+    {
+        std::unique_lock<std::mutex> lk(m);
+        x.cv_.wait(lk, [&]() { return threads_active == 0; });
----------------
We are trying to make the cv destructor race with the call to notify_all().
This is generally only going to happen if the main thread happens to see the threads_active as 0 between the lock being released and the cv being notified.

You could possibly increase the likelihood of a race by taking out a lock on the mutex before launching the threads.
The idea being that the other threads that launch quickly will wait to acquire the lock and then once the main thread releases the lock inside the call to cv.wait() then the other threads start at once.

I'd also consider removing the sleep_for() in the threads to encourage some more raciness of all of the threads exiting at the same time.
Really, it's only ever going to be the last thread to execute that will potentially run into the lifetime issue.

Another approach could be to have the main thread simply spin, acquiring the lock, looking at the threads_active and then releasing the lock and having it continue on and destroy the CV once it sees that no more threads are active.
i.e. just do a spin-loop version of cv.wait() without actually calling cv.wait()

If you do all of this in conjunction with inserting a sleep_for() call in the implementation of notify_all_at_thread_exit() between the unlock and notify_all calls (just for testing) then you should be able to reproduce the issue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105758/new/

https://reviews.llvm.org/D105758