[Openmp-commits] [openmp] r252082 - Refactor of task_team code.

Churbanov, Andrey via Openmp-commits openmp-commits at lists.llvm.org
Fri Aug 12 07:02:45 PDT 2016


Jonas,

We will take a look at the problem.

> For resolving this issue I'm not sure about the right way to go: If this can only happen in a fork-barrier, we could disable executing tasks there - which I think is valid anyway because all tasks should be guaranteed to be finished after the prior join-barrier, shouldn't they?

Worker threads usually do not execute tasks on join barrier, they pass through and execute tasks on fork barrier.  This is the design of our barriers - master thread stops on join barrier while other threads pass through and wait on fork barrier.  So the problem is some missed synchronization during tasks execution, - library shutdown should not happen while some threads execute tasks (even if there is no tasks left).

Thanks,
Andrey

-----Original Message-----
From: Openmp-commits [mailto:openmp-commits-bounces at lists.llvm.org] On Behalf Of Hahnfeld, Jonas via Openmp-commits
Sent: Friday, August 12, 2016 10:33 AM
To: Peyton, Jonathan L <jonathan.l.peyton at intel.com>
Cc: Protze, Joachim <protze at itc.rwth-aachen.de>; openmp-commits at lists.llvm.org
Subject: Re: [Openmp-commits] [openmp] r252082 - Refactor of task_team code.

Hi Jonathan,

Sorry for coming back to such an old change, but I think this has introduced a race condition.
Its timeframe is quite small, but it arises from time to time in test/tasking/kmp_taskloop.c, especially when the machine is busy and the OS keeps rescheduling the threads.

> -----Original Message-----
> From: Openmp-commits [mailto:openmp-commits-bounces at lists.llvm.org]
> On Behalf Of Jonathan Peyton via Openmp-commits
> Sent: Wednesday, November 04, 2015 10:38 PM
> To: openmp-commits at lists.llvm.org
> Subject: [Openmp-commits] [openmp] r252082 - Refactor of task_team 
> code.
>
> Author: jlpeyton
> Date: Wed Nov  4 15:37:48 2015
> New Revision: 252082
>
> URL: http://llvm.org/viewvc/llvm-project?rev=252082&view=rev
> Log:
> Refactor of task_team code.
>
> This is a refactoring of the task_team code that more elegantly 
> handles the two task_team case. Two task_teams per team are kept in 
> use for the lifetime of the team. Thus no reference counting is 
> needed.
>
> Differential Revision: http://reviews.llvm.org/D13993
>
> Modified:
>     openmp/trunk/runtime/src/kmp.h
>     openmp/trunk/runtime/src/kmp_barrier.cpp
>     openmp/trunk/runtime/src/kmp_runtime.c
>     openmp/trunk/runtime/src/kmp_tasking.c
>     openmp/trunk/runtime/src/kmp_wait_release.h
>
> [...]
>
> Modified: openmp/trunk/runtime/src/kmp_runtime.c
> URL: http://llvm.org/viewvc/llvm-
> project/openmp/trunk/runtime/src/kmp_runtime.c?rev=252082&r1=252081
> &r2=252082&view=diff
> ==========================================================
> ====================
> --- openmp/trunk/runtime/src/kmp_runtime.c (original)
> +++ openmp/trunk/runtime/src/kmp_runtime.c Wed Nov  4 15:37:48 2015
>
> [...]
>
> @@ -5342,18 +5276,17 @@ __kmp_free_team( kmp_root_t *root, kmp_t
>      /* if we are non-hot team, release our threads */
>      if( ! use_hot_team ) {
>          if ( __kmp_tasking_mode != tskm_immediate_exec ) {
> +            // Delete task teams
>              int tt_idx;
>              for (tt_idx=0; tt_idx<2; ++tt_idx) {
> -                // We don't know which of the two task teams workers are 
> waiting
> on, so deactivate both.
>                  kmp_task_team_t *task_team = team->t.t_task_team[tt_idx];
>                  if ( task_team != NULL ) {
> -                    // Signal the worker threads to stop looking for tasks 
> while spin
> waiting.  The task
> -                    // teams are reference counted and will be deallocated 
> by the last
> worker thread via the
> -                    // thread's pointer to the task team.
> -                    KA_TRACE( 20, ( "__kmp_free_team: deactivating 
> task_team
> %p\n", task_team ) );
> +                    for (f=0; f<team->t.t_nproc; ++f) { // Have all 
> + threads
> unref task
> teams
> +                        team->t.t_threads[f]->th.th_task_team = NULL;
> +                    }
> +                    KA_TRACE( 20, ( "__kmp_free_team: T#%d 
> + deactivating
> task_team %p on team %d\n", __kmp_get_gtid(), task_team, team->t.t_id 
> ) );
>                      KMP_DEBUG_ASSERT( team->t.t_nproc > 1 );
> -                    TCW_SYNC_4( task_team->tt.tt_active, FALSE );
> -                    KMP_MB();
> +                    __kmp_free_task_team( master, task_team );
>                      team->t.t_task_team[tt_idx] = NULL;
>                  }
>              }

I think this gives us a race with __kmp_execute_tasks: Worker threads in a fork-barrier are still trying to steal and execute tasks from other threads that were in the previous team.
When the master thread exits the parallel region and the runtime is shut down immediately afterwards, the task team and its threads are freed.
I'm able to reproduce a fault with the attached small program and a patch that adds a sleep() before stealing to widen the time frame for the race.
I think this can only happen when the library is built in debug mode because otherwise the freed struct is not overwritten with 0xefef...

I see that there are some checks added in __kmp_execute_tasks to deal with this case but I think this is not right to work in all scenarios.
Note that there is some other code in this change that loops over all threads and sets th_task_team to NULL. This might be problematic as well but I haven't deeply investigated this yet. I've so far only produced a crash when a worker thread is trying to wake up the victim thread but I think this is because there happens to be the first dereference of the problematic structures.

For resolving this issue I'm not sure about the right way to go: If this can only happen in a fork-barrier, we could disable executing tasks there - which I think is valid anyway because all tasks should be guaranteed to be finished after the prior join-barrier, shouldn't they?

Regards,
Jonas

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


More information about the Openmp-commits mailing list