[Openmp-commits] [PATCH] D59793: [OpenMP] Fix hang on Windows

Jonathan Peyton via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Mar 25 12:21:46 PDT 2019


jlpeyton created this revision.
jlpeyton added reviewers: tlwilmar, hbae.
jlpeyton added a project: OpenMP.
Herald added subscribers: jdoerfert, guansong.

Debug dump on large machine shows when many OpenMP threads (401 in total)
sleep on a barrier, one of the innermost nesting levels sleeps
on a child's b_arrived flag whose value is equal to 4 and is equal to
checker value. i.e., (1) sleep bit is 0, and (2) done_check() would
return true if called.

It is unclear how this might happen.  It could be Windows Server 2016's
error of EnterCriticalSection / LeaveCriticalSection, or
error of WaitForSingleObject / SetEvent / ResetEvent, or
error in the library which is very difficult to find.

As a workaround, change INFINITE wait to timed wait, so that each
thread awakens each 5 seconds (the timeout was chosen arbitrary to not
disturb other threads much), check flag condition under the lock, and
either go to sleep again or stop sleeping as a result of the check.

Patch by Andrey Churbanov


Repository:
  rOMP OpenMP

https://reviews.llvm.org/D59793

Files:
  runtime/src/z_Windows_NT_util.cpp


Index: runtime/src/z_Windows_NT_util.cpp
===================================================================
--- runtime/src/z_Windows_NT_util.cpp
+++ runtime/src/z_Windows_NT_util.cpp
@@ -193,8 +193,9 @@
 /* TODO associate cv with a team instead of a thread so as to optimize
    the case where we wake up a whole team */
 
-void __kmp_win32_cond_wait(kmp_win32_cond_t *cv, kmp_win32_mutex_t *mx,
-                           kmp_info_t *th, int need_decrease_load) {
+template <class C>
+static void __kmp_win32_cond_wait(kmp_win32_cond_t *cv, kmp_win32_mutex_t *mx,
+                                  kmp_info_t *th, C *flag) {
   int my_generation;
   int last_waiter;
 
@@ -211,21 +212,46 @@
   __kmp_win32_mutex_unlock(mx);
 
   for (;;) {
-    int wait_done;
-
+    int wait_done = 0;
+    DWORD res, timeout = 5000; // just tried to quess an appropriate number
     /* Wait until the event is signaled */
-    WaitForSingleObject(cv->event_, INFINITE);
-
-    __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
-
-    /* Exit the loop when the <cv->event_> is signaled and there are still
-       waiting threads from this <wait_generation> that haven't been released
-       from this wait yet. */
-    wait_done = (cv->release_count_ > 0) &&
-                (cv->wait_generation_count_ != my_generation);
-
-    __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
-
+    res = WaitForSingleObject(cv->event_, timeout);
+
+    if (res == WAIT_OBJECT_0) {
+      // event signaled
+      __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
+      /* Exit the loop when the <cv->event_> is signaled and there are still
+         waiting threads from this <wait_generation> that haven't been released
+         from this wait yet. */
+      wait_done = (cv->release_count_ > 0) &&
+                  (cv->wait_generation_count_ != my_generation);
+      __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
+    } else if (res == WAIT_TIMEOUT || res == WAIT_FAILED) {
+      // check if the flag and cv counters are in consistent state
+      // as MS sent us debug dump whith inconsistent state of data
+      __kmp_win32_mutex_lock(mx);
+      typename C::flag_t old_f = flag->set_sleeping();
+      if (!flag->done_check_val(old_f & ~KMP_BARRIER_SLEEP_STATE)) {
+        __kmp_win32_mutex_unlock(mx);
+        continue;
+      }
+      // condition fulfilled, exiting
+      old_f = flag->unset_sleeping();
+      KMP_DEBUG_ASSERT(old_f & KMP_BARRIER_SLEEP_STATE);
+      TCW_PTR(th->th.th_sleep_loc, NULL);
+      KF_TRACE(50, ("__kmp_win32_cond_wait: exiting, condition "
+                    "fulfilled: flag's loc(%p): %u => %u\n",
+                    flag->get(), old_f, *(flag->get())));
+
+      __kmp_win32_mutex_lock(&cv->waiters_count_lock_);
+      KMP_DEBUG_ASSERT(cv->waiters_count_ > 0);
+      cv->release_count_ = cv->waiters_count_;
+      cv->wait_generation_count_++;
+      wait_done = 1;
+      __kmp_win32_mutex_unlock(&cv->waiters_count_lock_);
+
+      __kmp_win32_mutex_unlock(mx);
+    }
     /* there used to be a semicolon after the if statement, it looked like a
        bug, so i removed it */
     if (wait_done)
@@ -377,12 +403,11 @@
           KMP_DEBUG_ASSERT(TCR_4(__kmp_thread_pool_active_nth) >= 0);
         }
         deactivated = TRUE;
-
-        __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, 0,
-                              0);
+        __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, th,
+                              flag);
       } else {
-        __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, 0,
-                              0);
+        __kmp_win32_cond_wait(&th->th.th_suspend_cv, &th->th.th_suspend_mx, th,
+                              flag);
       }
 
 #ifdef KMP_DEBUG


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D59793.192182.patch
Type: text/x-patch
Size: 3798 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20190325/6f29d564/attachment.bin>


More information about the Openmp-commits mailing list