[Openmp-dev] 100% CPU usage when threads can't steal tasks

Churbanov, Andrey via Openmp-dev openmp-dev at lists.llvm.org
Fri Sep 22 08:56:14 PDT 2017


> having a thread run at 100% is detrimental to other unrelated processes in the same machine.
You could limit the HW resources for your application (e.g. using KMP_HW_SUBSET environment variable), leaving the rest of machine for other processes. The OpenMP runtime library has default settings those give as much performance as possible, and these settings are in contrary with composability with other applications running in parallel on the same HW.  You may need to manually adjust settings so that your application behave in composable way.

> Is this intended? Are there any other parameters to tweak?
Yes, it is intended. If there are no tasks being executed and no tasks to execute, then the blocktime works for idle threads.  If there are tasks then threads never go to sleep waiting for more tasks to be scheduled.  That is usual tasking scenario – when one task can generate many more tasks.  If we would let idle threads to go to sleep when some thread generates new tasks, that will slow down a lot of existing codes. So I don’t see how your highly unbalanced case can be improved by the library without hurting real applications with better work balance.

Regarding the wait policy, it is currently implemented so that threads call sched_yield for passive policy and spin more actively otherwise.  Unfortunately modern kernels behave so that yielding threads are scheduled for execution more frequently in order to let all threads consume equal CPU time, regardless of possible work imbalance in the application.  We may think of re-implementing the passive wait policy, or introducing new “very passive” policy, but this would need careful design and time for implementation.

Your current suggested change breaks the task stealing algorithm, because threads do multiple attempts to steal tasks from randomly chosen victim thread until success, and the change breaks this letting a thread go to sleep after the first stealing attempt.  It would be possible to count number of tasks globally and let threads go to sleep if there are no tasks to execute.  But I suspect this can cause significant slowdown of existing codes, so a lot of performance testing should be done first before applying something like this.

-- Andrey

From: Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] On Behalf Of Adhityaa Chandrasekar via Openmp-dev
Sent: Friday, September 22, 2017 9:22 AM
To: Jeff Hammond <jeff.science at gmail.com>
Cc: openmp-dev at lists.llvm.org
Subject: Re: [Openmp-dev] 100% CPU usage when threads can't steal tasks


On Fri, Sep 22, 2017 at 2:21 AM, Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>> wrote:
You might find https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/556869 relevant.

Thanks for that link. I agree with the statement that time to solution is a more important metric in comparison to CPU time. However, having a thread run at 100% is detrimental to other unrelated processes in the same machine.

Is there anything else you want me to notice in that thread?


I consider idling in the OpenMP runtime to be a sign of a badly behaved application, not a runtime bug in need of fixing.  But in any case, OMP_WAIT_POLICY/KMP_BLOCKTIME exist to address that.

OMP_WAIT_POLICY and KMP_BLOCKTIME indeed are designed to solve that - however, I don't think the clang OpenMP runtime is respecting the OMP_WAIT_POLICY value -- I set `export OMP_WAIT_POLICY=passive` and compiled the example program with g++ and clang++. The g++ binary used virtually no CPU while the clang++ binary had three processes at 100% (three because OMP_NUM_THREADS=4 and only 1 is executing a task; the other three are idle).

Is this intended? Are there any other parameters to tweak?

In any case, I think a default of "temporarily active for a while before switching to passive" instead of "always active unless the user manually overrides" is saner, no?


Best,

Jeff

On Thu, Sep 21, 2017 at 1:11 PM, Adhityaa Chandrasekar via Openmp-dev <openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>> wrote:
Hi,

My name is Adhityaa and I'm an undergrad student. This is my first time getting involved in OpenMP development at LLVM. I have used the library a bunch of times before, but I haven't been involved in the behind-the-scenes work before.

So I was just taking a look at https://bugs.llvm.org/show_bug.cgi?id=33543 and I tried solving it (it looked like a simple, reproducible bug that looked quite important). First I ran strace on the program to see exactly what was happening - I found that the threads without any tasks were issuing a whole bunch of `sched_yield()`, making the CPU go to 100%.

Then I tried going into the code. I came up with a fairly simple patch that almost solved the issue:

--- kmp_wait_release.h (revision 313888)
+++ kmp_wait_release.h (working copy)
@@ -188,6 +188,7 @@
   // Main wait spin loop
   while (flag->notdone_check()) {
     int in_pool;
+    int executed_tasks = 1;
     kmp_task_team_t *task_team = NULL;
     if (__kmp_tasking_mode != tskm_immediate_exec) {
       task_team = this_thr->th.th_task_team;
@@ -200,10 +201,11 @@
          disabled (KMP_TASKING=0).  */
       if (task_team != NULL) {
         if (TCR_SYNC_4(task_team->tt.tt_active)) {
-          if (KMP_TASKING_ENABLED(task_team))
-            flag->execute_tasks(
+          if (KMP_TASKING_ENABLED(task_team)) {
+            executed_tasks = flag->execute_tasks(
                 this_thr, th_gtid, final_spin,
                 &tasks_completed USE_ITT_BUILD_ARG(itt_sync_obj), 0);
+          }
           else
             this_thr->th.th_reap_state = KMP_SAFE_TO_REAP;
         } else {
@@ -269,7 +271,7 @@
       continue;

     // Don't suspend if there is a likelihood of new tasks being spawned.
-    if ((task_team != NULL) && TCR_4(task_team->tt.tt_found_tasks))
+    if ((task_team != NULL) && TCR_4(task_team->tt.tt_found_tasks) && executed_tasks)
       continue;

 #if KMP_USE_MONITOR

Alas, this led to a deadlock - the thread went into a futex wait and was never woke again. So I looked at how GCC did things and they issue a FUTEX_WAKE_PRIVATE for INT_MAX threads at the end of things. This should solve the problem, I think?

Anyway, it's pretty late here and I've been at this for over 6-7 hours at a stretch, so I'm really tired. I was just wondering if I could get some help on how to proceed from here (and whether I'm even on the right track).

Thanks,
Adhityaa

_______________________________________________
Openmp-dev mailing list
Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev



--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/


--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20170922/def06d76/attachment-0001.html>


More information about the Openmp-dev mailing list