[Openmp-dev] 100% CPU usage when threads can't steal tasks

Fri Sep 22 10:46:27 PDT 2017

Adhityaa -

Indeed, I would like to see data for interesting tests, not code that
doesn't do anything and is pathologically load-imbalanced.

https://github.com/ParRes/Kernels contain some interesting tests that are
very easy to run.  See p2p*task*, stencil*task*, and transpose*task* in
C1z, Cxx11, and/or FORTRAN subdirectories.  The language should not matter,
as they are designed to be as similar as possible.  Configuring the build
is just "ln -sf common/make.defs.${toolchain} common/make.defs" after
appropriate changes to make.defs.${toolchain}.

What impact does "very passive" have on these programs?  Feel free to
contact me privately about experimental setup.

Jeff

On Fri, Sep 22, 2017 at 9:38 AM, Adhityaa Chandrasekar via Openmp-dev <
openmp-dev at lists.llvm.org> wrote:

> On Fri, Sep 22, 2017 at 9:26 PM, Churbanov, Andrey <
> Andrey.Churbanov at intel.com> wrote:
>
>> > having a thread run at 100% is detrimental to other unrelated processes
>> in the same machine.
>>
>> You could limit the HW resources for your application (e.g. using
>> KMP_HW_SUBSET environment variable), leaving the rest of machine for other
>> processes. The OpenMP runtime library has default settings those give as
>> much performance as possible, and these settings are in contrary with
>> composability with other applications running in parallel on the same HW.
>> You may need to manually adjust settings so that your application behave in
>> composable way.
>>
>>
>>
>> > Is this intended? Are there any other parameters to tweak?
>>
>> Yes, it is intended. If there are no tasks being executed and no tasks to
>> execute, then the blocktime works for idle threads.  If there are tasks
>> then threads never go to sleep waiting for more tasks to be scheduled.
>> That is usual tasking scenario – when one task can generate many more
>> tasks.  If we would let idle threads to go to sleep when some thread
>> generates new tasks, that will slow down a lot of existing codes. So I
>> don’t see how your highly unbalanced case can be improved by the library
>> without hurting real applications with better work balance.
>>
>>
>>
>> Regarding the wait policy, it is currently implemented so that threads
>> call sched_yield for passive policy and spin more actively otherwise.
>> Unfortunately modern kernels behave so that yielding threads are scheduled
>> for execution more frequently in order to let all threads consume equal CPU
>> time, regardless of possible work imbalance in the application.  We may
>> think of re-implementing the passive wait policy, or introducing new “very
>> passive” policy, but this would need careful design and time for
>> implementation.
>>
>
> Just to clarify: would this new "very passive" (dormant would be a better
> term IMO) policy be akin to GCC's passive (where they use zero resources
> because of a futex wait)?
>
>
>>
>>
>> Your current suggested change breaks the task stealing algorithm, because
>> threads do multiple attempts to steal tasks from randomly chosen victim
>> thread until success, and the change breaks this letting a thread go to
>> sleep after the first stealing attempt.  It would be possible to count
>> number of tasks globally and let threads go to sleep if there are no tasks
>> to execute.  But I suspect this can cause significant slowdown of existing
>> codes, so a lot of performance testing should be done first before applying
>> something like this.
>>
>
> Does it have to be global? Making it global would mean a lot of cache
> misses, right? Have a per-thread counter that's decremented every time it
> can't steal a task, if it reaches zero reset to default value and go to
> sleep. At any point in time, if we steal a task successfully, reset the
> counter.
>
>
>>
>>
>> -- Andrey
>>
>>
>>
>> *From:* Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] *On Behalf
>> Of *Adhityaa Chandrasekar via Openmp-dev
>> *Sent:* Friday, September 22, 2017 9:22 AM
>> *To:* Jeff Hammond <jeff.science at gmail.com>
>> *Cc:* openmp-dev at lists.llvm.org
>> *Subject:* Re: [Openmp-dev] 100% CPU usage when threads can't steal tasks
>>
>>
>>
>>
>>
>> On Fri, Sep 22, 2017 at 2:21 AM, Jeff Hammond <jeff.science at gmail.com>
>> wrote:
>>
>> You might find https://software.intel.com/en-us/forums/intel-many-inte
>> grated-core/topic/556869 relevant.
>>
>>
>>
>> Thanks for that link. I agree with the statement that time to solution is
>> a more important metric in comparison to CPU time. However, having a thread
>> run at 100% is detrimental to other unrelated processes in the same machine.
>>
>>
>>
>> Is there anything else you want me to notice in that thread?
>>
>>
>>
>>
>>
>> I consider idling in the OpenMP runtime to be a sign of a badly behaved
>> application, not a runtime bug in need of fixing.  But in any case,
>> OMP_WAIT_POLICY/KMP_BLOCKTIME exist to address that.
>>
>>
>>
>> OMP_WAIT_POLICY and KMP_BLOCKTIME indeed are designed to solve that -
>> however, I don't think the clang OpenMP runtime is respecting the
>> OMP_WAIT_POLICY value -- I set `export OMP_WAIT_POLICY=passive` and
>> compiled the example program with g++ and clang++. The g++ binary used
>> virtually no CPU while the clang++ binary had three processes at 100%
>> (three because OMP_NUM_THREADS=4 and only 1 is executing a task; the other
>> three are idle).
>>
>>
>>
>> Is this intended? Are there any other parameters to tweak?
>>
>>
>>
>> In any case, I think a default of "temporarily active for a while before
>> switching to passive" instead of "always active unless the user manually
>> overrides" is saner, no?
>>
>>
>>
>>
>>
>> Best,
>>
>>
>>
>> Jeff
>>
>>
>>
>> On Thu, Sep 21, 2017 at 1:11 PM, Adhityaa Chandrasekar via Openmp-dev <
>> openmp-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>>
>>
>> My name is Adhityaa and I'm an undergrad student. This is my first time
>> getting involved in OpenMP development at LLVM. I have used the library a
>> bunch of times before, but I haven't been involved in the behind-the-scenes
>> work before.
>>
>>
>>
>> So I was just taking a look at https://bugs.llvm.org/show_
>> bug.cgi?id=33543 and I tried solving it (it looked like a simple,
>> reproducible bug that looked quite important). First I ran strace on the
>> program to see exactly what was happening - I found that the threads
>> without any tasks were issuing a whole bunch of `sched_yield()`, making the
>> CPU go to 100%.
>>
>>
>>
>> Then I tried going into the code. I came up with a fairly simple patch
>> that almost solved the issue:
>>
>>
>>
>> --- kmp_wait_release.h (revision 313888)
>>
>> +++ kmp_wait_release.h (working copy)
>>
>> @@ -188,6 +188,7 @@
>>
>>    // Main wait spin loop
>>
>>    while (flag->notdone_check()) {
>>
>>      int in_pool;
>>
>> +    int executed_tasks = 1;
>>
>>      kmp_task_team_t *task_team = NULL;
>>
>>      if (__kmp_tasking_mode != tskm_immediate_exec) {
>>
>>        task_team = this_thr->th.th_task_team;
>>
>> @@ -200,10 +201,11 @@
>>
>>           disabled (KMP_TASKING=0).  */
>>
>>        if (task_team != NULL) {
>>
>>          if (TCR_SYNC_4(task_team->tt.tt_active)) {
>>
>> -          if (KMP_TASKING_ENABLED(task_team))
>>
>> -            flag->execute_tasks(
>>
>> +          if (KMP_TASKING_ENABLED(task_team)) {
>>
>> +            executed_tasks = flag->execute_tasks(
>>
>>                  this_thr, th_gtid, final_spin,
>>
>>                  &tasks_completed USE_ITT_BUILD_ARG(itt_sync_obj), 0);
>>
>> +          }
>>
>>            else
>>
>>              this_thr->th.th_reap_state = KMP_SAFE_TO_REAP;
>>
>>          } else {
>>
>> @@ -269,7 +271,7 @@
>>
>>        continue;
>>
>>
>>
>>      // Don't suspend if there is a likelihood of new tasks being spawned.
>>
>> -    if ((task_team != NULL) && TCR_4(task_team->tt.tt_found_tasks))
>>
>> +    if ((task_team != NULL) && TCR_4(task_team->tt.tt_found_tasks) &&
>> executed_tasks)
>>
>>        continue;
>>
>>
>>
>>  #if KMP_USE_MONITOR
>>
>>
>>
>> Alas, this led to a deadlock - the thread went into a futex wait and was
>> never woke again. So I looked at how GCC did things and they issue a
>> FUTEX_WAKE_PRIVATE for INT_MAX threads at the end of things. This should
>> solve the problem, I think?
>>
>>
>>
>> Anyway, it's pretty late here and I've been at this for over 6-7 hours at
>> a stretch, so I'm really tired. I was just wondering if I could get some
>> help on how to proceed from here (and whether I'm even on the right track).
>>
>>
>>
>> Thanks,
>>
>> Adhityaa
>>
>>
>>
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>>
>>
>>
>>
>> --
>>
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>>
>>
>>
>> --------------------------------------------------------------------
>> Joint Stock Company Intel A/O
>> Registered legal address: Krylatsky Hills Business Park,
>> 17 Krylatskaya Str., Bldg 4, Moscow 121614,
>> <https://maps.google.com/?q=17+Krylatskaya+Str.,+Bldg+4,+Moscow+121614,+Russian+Federation&entry=gmail&source=g>
>> Russian Federation
>> <https://maps.google.com/?q=17+Krylatskaya+Str.,+Bldg+4,+Moscow+121614,+Russian+Federation&entry=gmail&source=g>
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
>
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20170922/1309d115/attachment-0001.html>