[Openmp-commits] [openmp] r252082 - Refactor of task_team code.
Hahnfeld, Jonas via Openmp-commits
openmp-commits at lists.llvm.org
Fri Aug 12 00:33:23 PDT 2016
Hi Jonathan,
Sorry for coming back to such an old change, but I think this has introduced a
race condition.
Its timeframe is quite small, but it arises from time to time in
test/tasking/kmp_taskloop.c, especially when the machine is busy and the OS
keeps rescheduling the threads.
> -----Original Message-----
> From: Openmp-commits [mailto:openmp-commits-bounces at lists.llvm.org]
> On Behalf Of Jonathan Peyton via Openmp-commits
> Sent: Wednesday, November 04, 2015 10:38 PM
> To: openmp-commits at lists.llvm.org
> Subject: [Openmp-commits] [openmp] r252082 - Refactor of task_team
> code.
>
> Author: jlpeyton
> Date: Wed Nov 4 15:37:48 2015
> New Revision: 252082
>
> URL: http://llvm.org/viewvc/llvm-project?rev=252082&view=rev
> Log:
> Refactor of task_team code.
>
> This is a refactoring of the task_team code that more elegantly handles the
> two
> task_team case. Two task_teams per team are kept in use for the lifetime of
> the
> team. Thus no reference counting is needed.
>
> Differential Revision: http://reviews.llvm.org/D13993
>
> Modified:
> openmp/trunk/runtime/src/kmp.h
> openmp/trunk/runtime/src/kmp_barrier.cpp
> openmp/trunk/runtime/src/kmp_runtime.c
> openmp/trunk/runtime/src/kmp_tasking.c
> openmp/trunk/runtime/src/kmp_wait_release.h
>
> [...]
>
> Modified: openmp/trunk/runtime/src/kmp_runtime.c
> URL: http://llvm.org/viewvc/llvm-
> project/openmp/trunk/runtime/src/kmp_runtime.c?rev=252082&r1=252081
> &r2=252082&view=diff
> ==========================================================
> ====================
> --- openmp/trunk/runtime/src/kmp_runtime.c (original)
> +++ openmp/trunk/runtime/src/kmp_runtime.c Wed Nov 4 15:37:48 2015
>
> [...]
>
> @@ -5342,18 +5276,17 @@ __kmp_free_team( kmp_root_t *root, kmp_t
> /* if we are non-hot team, release our threads */
> if( ! use_hot_team ) {
> if ( __kmp_tasking_mode != tskm_immediate_exec ) {
> + // Delete task teams
> int tt_idx;
> for (tt_idx=0; tt_idx<2; ++tt_idx) {
> - // We don't know which of the two task teams workers are
> waiting
> on, so deactivate both.
> kmp_task_team_t *task_team = team->t.t_task_team[tt_idx];
> if ( task_team != NULL ) {
> - // Signal the worker threads to stop looking for tasks
> while spin
> waiting. The task
> - // teams are reference counted and will be deallocated
> by the last
> worker thread via the
> - // thread's pointer to the task team.
> - KA_TRACE( 20, ( "__kmp_free_team: deactivating
> task_team
> %p\n", task_team ) );
> + for (f=0; f<team->t.t_nproc; ++f) { // Have all threads
> unref task
> teams
> + team->t.t_threads[f]->th.th_task_team = NULL;
> + }
> + KA_TRACE( 20, ( "__kmp_free_team: T#%d deactivating
> task_team %p on team %d\n", __kmp_get_gtid(), task_team, team->t.t_id )
> );
> KMP_DEBUG_ASSERT( team->t.t_nproc > 1 );
> - TCW_SYNC_4( task_team->tt.tt_active, FALSE );
> - KMP_MB();
> + __kmp_free_task_team( master, task_team );
> team->t.t_task_team[tt_idx] = NULL;
> }
> }
I think this gives us a race with __kmp_execute_tasks: Worker threads in a
fork-barrier are still trying to steal and execute tasks from other threads
that were in the previous team.
When the master thread exits the parallel region and the runtime is shut down
immediately afterwards, the task team and its threads are freed.
I'm able to reproduce a fault with the attached small program and a patch that
adds a sleep() before stealing to widen the time frame for the race.
I think this can only happen when the library is built in debug mode because
otherwise the freed struct is not overwritten with 0xefef...
I see that there are some checks added in __kmp_execute_tasks to deal with
this case but I think this is not right to work in all scenarios.
Note that there is some other code in this change that loops over all threads
and sets th_task_team to NULL. This might be problematic as well but I haven't
deeply investigated this yet. I've so far only produced a crash when a worker
thread is trying to wake up the victim thread but I think this is because
there happens to be the first dereference of the problematic structures.
For resolving this issue I'm not sure about the right way to go: If this can
only happen in a fork-barrier, we could disable executing tasks there - which
I think is valid anyway because all tasks should be guaranteed to be finished
after the prior join-barrier, shouldn't they?
Regards,
Jonas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crash.c
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: steal_sleep.diff
Type: application/octet-stream
Size: 791 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bisect.log
Type: application/octet-stream
Size: 2220 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5868 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20160812/9e2eec8a/attachment.bin>
More information about the Openmp-commits
mailing list