[Openmp-commits] [PATCH] D59793: [OpenMP] Fix hang on Windows
Jonathan Peyton via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Mar 25 12:21:46 PDT 2019
jlpeyton created this revision.
jlpeyton added reviewers: tlwilmar, hbae.
jlpeyton added a project: OpenMP.
Herald added subscribers: jdoerfert, guansong.
Debug dump on large machine shows when many OpenMP threads (401 in total)
sleep on a barrier, one of the innermost nesting levels sleeps
on a child's b_arrived flag whose value is equal to 4 and is equal to
checker value. i.e., (1) sleep bit is 0, and (2) done_check() would
return true if called.
It is unclear how this might happen. It could be Windows Server 2016's
error of EnterCriticalSection / LeaveCriticalSection, or
error of WaitForSingleObject / SetEvent / ResetEvent, or
error in the library which is very difficult to find.
As a workaround, change INFINITE wait to timed wait, so that each
thread awakens each 5 seconds (the timeout was chosen arbitrary to not
disturb other threads much), check flag condition under the lock, and
either go to sleep again or stop sleeping as a result of the check.
Patch by Andrey Churbanov
Repository:
rOMP OpenMP
https://reviews.llvm.org/D59793
Files:
runtime/src/z_Windows_NT_util.cpp
Index: runtime/src/z_Windows_NT_util.cpp
===================================================================
--- runtime/src/z_Windows_NT_util.cpp
+++ runtime/src/z_Windows_NT_util.cpp
@@ -193,8 +193,9 @@
/* TODO associate cv with a team instead of a thread so as to optimize
the case where we wake up a whole team */
-void __kmp_win32_cond_wait(kmp_win32_cond_t *cv, kmp_win32_mutex_t *mx,
- kmp_info_t *th, int need_decrease_load) {
+template <class C>
+static void __kmp_win32_cond_wait(kmp_win32_cond_t *cv, kmp_win32_mutex_t *mx,
+ kmp_info_t *th, C *flag) {
int my_generation;
int last_waiter;
@@ -211,21 +212,46 @@
__kmp_win32_mutex_unlock(mx);
for (;;) {
- int wait_done;
-
+ int wait_done = 0;
+ DWORD res, timeout = 5000; // just tried to quess an appropriate number
/* Wait until the event is signaled */
- WaitForSingleObject(cv->event_, INFINITE);
-
- __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
-
- /* Exit the loop when the <cv->event_> is signaled and there are still
- waiting threads from this <wait_generation> that haven't been released
- from this wait yet. */
- wait_done = (cv->release_count_ > 0) &&
- (cv->wait_generation_count_ != my_generation);
-
- __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
-
+ res = WaitForSingleObject(cv->event_, timeout);
+
+ if (res == WAIT_OBJECT_0) {
+ // event signaled
+ __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
+ /* Exit the loop when the <cv->event_> is signaled and there are still
+ waiting threads from this <wait_generation> that haven't been released
+ from this wait yet. */
+ wait_done = (cv->release_count_ > 0) &&
+ (cv->wait_generation_count_ != my_generation);
+ __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
+ } else if (res == WAIT_TIMEOUT || res == WAIT_FAILED) {
+ // check if the flag and cv counters are in consistent state
+ // as MS sent us debug dump whith inconsistent state of data
+ __kmp_win32_mutex_lock(mx);
+ typename C::flag_t old_f = flag->set_sleeping();
+ if (!flag->done_check_val(old_f & ~KMP_BARRIER_SLEEP_STATE)) {
+ __kmp_win32_mutex_unlock(mx);
+ continue;
+ }
+ // condition fulfilled, exiting
+ old_f = flag->unset_sleeping();
+ KMP_DEBUG_ASSERT(old_f & KMP_BARRIER_SLEEP_STATE);
+ TCW_PTR(th->th.th_sleep_loc, NULL);
+ KF_TRACE(50, ("__kmp_win32_cond_wait: exiting, condition "
+ "fulfilled: flag's loc(%p): %u => %u\n",
+ flag->get(), old_f, *(flag->get())));
+
+ __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
+ KMP_DEBUG_ASSERT(cv->waiters_count_ > 0);
+ cv->release_count_ = cv->waiters_count_;
+ cv->wait_generation_count_++;
+ wait_done = 1;
+ __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
+
+ __kmp_win32_mutex_unlock(mx);
+ }
/* there used to be a semicolon after the if statement, it looked like a
bug, so i removed it */
if (wait_done)
@@ -377,12 +403,11 @@
KMP_DEBUG_ASSERT(TCR_4(__kmp_thread_pool_active_nth) >= 0);
}
deactivated = TRUE;
-
- __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, 0,
- 0);
+ __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, th,
+ flag);
} else {
- __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, 0,
- 0);
+ __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, th,
+ flag);
}
#ifdef KMP_DEBUG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D59793.192182.patch
Type: text/x-patch
Size: 3798 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20190325/6f29d564/attachment.bin>
More information about the Openmp-commits
mailing list