[Openmp-commits] [openmp] r252082 - Refactor of task_team code.

Hahnfeld, Jonas via Openmp-commits openmp-commits at lists.llvm.org
Fri Aug 12 00:33:23 PDT 2016


Hi Jonathan,

Sorry for coming back to such an old change, but I think this has introduced a 
race condition.
Its timeframe is quite small, but it arises from time to time in 
test/tasking/kmp_taskloop.c, especially when the machine is busy and the OS 
keeps rescheduling the threads.

> -----Original Message-----
> From: Openmp-commits [mailto:openmp-commits-bounces at lists.llvm.org]
> On Behalf Of Jonathan Peyton via Openmp-commits
> Sent: Wednesday, November 04, 2015 10:38 PM
> To: openmp-commits at lists.llvm.org
> Subject: [Openmp-commits] [openmp] r252082 - Refactor of task_team
> code.
>
> Author: jlpeyton
> Date: Wed Nov  4 15:37:48 2015
> New Revision: 252082
>
> URL: http://llvm.org/viewvc/llvm-project?rev=252082&view=rev
> Log:
> Refactor of task_team code.
>
> This is a refactoring of the task_team code that more elegantly handles the
> two
> task_team case. Two task_teams per team are kept in use for the lifetime of
> the
> team. Thus no reference counting is needed.
>
> Differential Revision: http://reviews.llvm.org/D13993
>
> Modified:
>     openmp/trunk/runtime/src/kmp.h
>     openmp/trunk/runtime/src/kmp_barrier.cpp
>     openmp/trunk/runtime/src/kmp_runtime.c
>     openmp/trunk/runtime/src/kmp_tasking.c
>     openmp/trunk/runtime/src/kmp_wait_release.h
>
> [...]
>
> Modified: openmp/trunk/runtime/src/kmp_runtime.c
> URL: http://llvm.org/viewvc/llvm-
> project/openmp/trunk/runtime/src/kmp_runtime.c?rev=252082&r1=252081
> &r2=252082&view=diff
> ==========================================================
> ====================
> --- openmp/trunk/runtime/src/kmp_runtime.c (original)
> +++ openmp/trunk/runtime/src/kmp_runtime.c Wed Nov  4 15:37:48 2015
>
> [...]
>
> @@ -5342,18 +5276,17 @@ __kmp_free_team( kmp_root_t *root, kmp_t
>      /* if we are non-hot team, release our threads */
>      if( ! use_hot_team ) {
>          if ( __kmp_tasking_mode != tskm_immediate_exec ) {
> +            // Delete task teams
>              int tt_idx;
>              for (tt_idx=0; tt_idx<2; ++tt_idx) {
> -                // We don't know which of the two task teams workers are 
> waiting
> on, so deactivate both.
>                  kmp_task_team_t *task_team = team->t.t_task_team[tt_idx];
>                  if ( task_team != NULL ) {
> -                    // Signal the worker threads to stop looking for tasks 
> while spin
> waiting.  The task
> -                    // teams are reference counted and will be deallocated 
> by the last
> worker thread via the
> -                    // thread's pointer to the task team.
> -                    KA_TRACE( 20, ( "__kmp_free_team: deactivating 
> task_team
> %p\n", task_team ) );
> +                    for (f=0; f<team->t.t_nproc; ++f) { // Have all threads 
> unref task
> teams
> +                        team->t.t_threads[f]->th.th_task_team = NULL;
> +                    }
> +                    KA_TRACE( 20, ( "__kmp_free_team: T#%d deactivating
> task_team %p on team %d\n", __kmp_get_gtid(), task_team, team->t.t_id )
> );
>                      KMP_DEBUG_ASSERT( team->t.t_nproc > 1 );
> -                    TCW_SYNC_4( task_team->tt.tt_active, FALSE );
> -                    KMP_MB();
> +                    __kmp_free_task_team( master, task_team );
>                      team->t.t_task_team[tt_idx] = NULL;
>                  }
>              }

I think this gives us a race with __kmp_execute_tasks: Worker threads in a 
fork-barrier are still trying to steal and execute tasks from other threads 
that were in the previous team.
When the master thread exits the parallel region and the runtime is shut down 
immediately afterwards, the task team and its threads are freed.
I'm able to reproduce a fault with the attached small program and a patch that 
adds a sleep() before stealing to widen the time frame for the race.
I think this can only happen when the library is built in debug mode because 
otherwise the freed struct is not overwritten with 0xefef...

I see that there are some checks added in __kmp_execute_tasks to deal with 
this case but I think this is not right to work in all scenarios.
Note that there is some other code in this change that loops over all threads 
and sets th_task_team to NULL. This might be problematic as well but I haven't 
deeply investigated this yet. I've so far only produced a crash when a worker 
thread is trying to wake up the victim thread but I think this is because 
there happens to be the first dereference of the problematic structures.

For resolving this issue I'm not sure about the right way to go: If this can 
only happen in a fork-barrier, we could disable executing tasks there - which 
I think is valid anyway because all tasks should be guaranteed to be finished 
after the prior join-barrier, shouldn't they?

Regards,
Jonas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crash.c
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: steal_sleep.diff
Type: application/octet-stream
Size: 791 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bisect.log
Type: application/octet-stream
Size: 2220 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.bin>


More information about the Openmp-commits mailing list