[Openmp-commits] [openmp] [OpenMP][OMPT] Add missing callbacks for asynchronous target tasks (PR #93472)

Jan André Reuter via Openmp-commits openmp-commits at lists.llvm.org
Mon May 27 07:21:22 PDT 2024


Thyre wrote:

> @Thyre this patch might fix some of your issues.

Thanks for pinging me. This PR should solve https://github.com/llvm/llvm-project/issues/62764.

I've tried the reproducer I've provided in the linked issue. For this particular case, all threads are now correctly created. This is the output of my test case:

```
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[thread_begin_cb] tid = 1 | type = 1
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = 1 | actual_parallelism = 1 | index = 1 | flags = 1
[thread_begin_cb] tid = 2 | type = 1
[implicit_task_cb] tid = 2 | parallel_data = 0 | task_data = 6660002 | endpoint = 1 | actual_parallelism = 1 | index = 1 | flags = 1
[parallel_begin_cb] tid = 2 | parallel_data = 7770001 | encountering_task_data = 6660002 | flags = -2147483646 | requested_parallelism = 8 | codeptr_ra = 0x7131a40ea78e
[thread_begin_cb] tid = 3 | type = 2
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = 1 | actual_parallelism = 8 | index = 0 | flags = 2
[implicit_task_cb] tid = 3 | parallel_data = 7770001 | task_data = 6660004 | endpoint = 1 | actual_parallelism = 8 | index = 1 | flags = 2
[thread_begin_cb] tid = 4 | type = 2
[implicit_task_cb] tid = 4 | parallel_data = 7770001 | task_data = 6660005 | endpoint = 1 | actual_parallelism = 8 | index = 2 | flags = 2
[thread_begin_cb] tid = 5 | type = 2
[thread_begin_cb] tid = 6 | type = 2
[implicit_task_cb] tid = 6 | parallel_data = 7770001 | task_data = 6660006 | endpoint = 1 | actual_parallelism = 8 | index = 3 | flags = 2
[thread_begin_cb] tid = 7 | type = 2
[thread_begin_cb] tid = 8 | type = 2
[thread_begin_cb] tid = 9 | type = 2
[implicit_task_cb] tid = 5 | parallel_data = 7770001 | task_data = 6660008 | endpoint = 1 | actual_parallelism = 8 | index = 7 | flags = 2
[implicit_task_cb] tid = 8 | parallel_data = 7770001 | task_data = 6660007 | endpoint = 1 | actual_parallelism = 8 | index = 4 | flags = 2
[implicit_task_cb] tid = 9 | parallel_data = 7770001 | task_data = 6660010 | endpoint = 1 | actual_parallelism = 8 | index = 6 | flags = 2
[implicit_task_cb] tid = 7 | parallel_data = 7770001 | task_data = 6660009 | endpoint = 1 | actual_parallelism = 8 | index = 5 | flags = 2
[implicit_task_cb] tid = 8 | parallel_data = 7777777 | task_data = 6660007 | endpoint = 2 | actual_parallelism = 0 | index = 4 | flags = 2
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = 2 | actual_parallelism = 8 | index = 0 | flags = 2
[implicit_task_cb] tid = 5 | parallel_data = 7777777 | task_data = 6660008 | endpoint = 2 | actual_parallelism = 0 | index = 7 | flags = 2
[implicit_task_cb] tid = 3 | parallel_data = 7777777 | task_data = 6660004 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 2
[implicit_task_cb] tid = 4 | parallel_data = 7777777 | task_data = 6660005 | endpoint = 2 | actual_parallelism = 0 | index = 2 | flags = 2
[implicit_task_cb] tid = 9 | parallel_data = 7777777 | task_data = 6660010 | endpoint = 2 | actual_parallelism = 0 | index = 6 | flags = 2
[implicit_task_cb] tid = 6 | parallel_data = 7777777 | task_data = 6660006 | endpoint = 2 | actual_parallelism = 0 | index = 3 | flags = 2
[implicit_task_cb] tid = 7 | parallel_data = 7777777 | task_data = 6660009 | endpoint = 2 | actual_parallelism = 0 | index = 5 | flags = 2
[implicit_task_cb] tid = 2 | parallel_data = 0 | task_data = 6660002 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 1
[thread_end_cb] tid = 2
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 1
[thread_end_cb] tid = 1
[thread_end_cb] tid = 3
[thread_end_cb] tid = 4
[thread_end_cb] tid = 6
[thread_end_cb] tid = 8
[thread_end_cb] tid = 7
[thread_end_cb] tid = 9
[thread_end_cb] tid = 5
[my_finalize_tool] tid = 1
```

What seems odd to me is that the region is created by `tid = 2`. In addition, there still seems to be something broken when the helper threads are created within a parallel region:

```c
int main(int argc, char **argv)
{
#pragma omp parallel num_threads( 2 )
#pragma omp target nowait         
    for(int j = 0; j < 10; ++j){}
#pragma omp taskwait
    return 0;

} // end main
```

```
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
[thread_begin_cb] tid = 1 | type = 1
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = 1 | actual_parallelism = 1 | index = 1 | flags = 1
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = -2147483646 | requested_parallelism = 2 | codeptr_ra = 0x5de448a8296d
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = 1 | actual_parallelism = 2 | index = 0 | flags = 2
[parallel_begin_cb] tid = -1 | parallel_data = 7770002 | WARNING encountering_task_data = 0 | flags = -2147483646 | requested_parallelism = 8 | codeptr_ra = 0x76141733730e
[thread_begin_cb] tid = 2 | type = 2
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660003 | endpoint = 1 | actual_parallelism = 2 | index = 1 | flags = 2
[thread_begin_cb] tid = 3 | type = 2
[thread_begin_cb] tid = 4 | type = 2
[thread_begin_cb] tid = 5 | type = 2
[thread_begin_cb] tid = 6 | type = 2
[thread_begin_cb] tid = 7 | type = 2
[thread_begin_cb] tid = 8 | type = 2
[implicit_task_cb] tid = -1 | parallel_data = 7770002 | task_data = 6660004 | endpoint = 1 | actual_parallelism = 8 | index = 0 | flags = 2
[thread_begin_cb] tid = 9 | type = 2
[implicit_task_cb] tid = 4 | parallel_data = 7770002 | task_data = 6660005 | endpoint = 1 | actual_parallelism = 8 | index = 4 | flags = 2
[implicit_task_cb] tid = 6 | parallel_data = 7770002 | task_data = 6660006 | endpoint = 1 | actual_parallelism = 8 | index = 7 | flags = 2
[implicit_task_cb] tid = 7 | parallel_data = 7770002 | task_data = 6660009 | endpoint = 1 | actual_parallelism = 8 | index = 3 | flags = 2
[implicit_task_cb] tid = 3 | parallel_data = 7770002 | task_data = 6660011 | endpoint = 1 | actual_parallelism = 8 | index = 5 | flags = 2
[implicit_task_cb] tid = 8 | parallel_data = 7770002 | task_data = 6660007 | endpoint = 1 | actual_parallelism = 8 | index = 6 | flags = 2
[implicit_task_cb] tid = 5 | parallel_data = 7770002 | task_data = 6660010 | endpoint = 1 | actual_parallelism = 8 | index = 2 | flags = 2
[implicit_task_cb] tid = 9 | parallel_data = 7770002 | task_data = 6660008 | endpoint = 1 | actual_parallelism = 8 | index = 1 | flags = 2
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660003 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 2
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = 2 | actual_parallelism = 2 | index = 0 | flags = 2
[implicit_task_cb] tid = 4 | parallel_data = 7777777 | task_data = 6660005 | endpoint = 2 | actual_parallelism = 0 | index = 4 | flags = 2
[implicit_task_cb] tid = -1 | parallel_data = 7777777 | task_data = 6660004 | endpoint = 2 | actual_parallelism = 8 | index = 0 | flags = 2
[implicit_task_cb] tid = 6 | parallel_data = 7777777 | task_data = 6660006 | endpoint = 2 | actual_parallelism = 0 | index = 7 | flags = 2
[implicit_task_cb] tid = 7 | parallel_data = 7777777 | task_data = 6660009 | endpoint = 2 | actual_parallelism = 0 | index = 3 | flags = 2
[implicit_task_cb] tid = 8 | parallel_data = 7777777 | task_data = 6660007 | endpoint = 2 | actual_parallelism = 0 | index = 6 | flags = 2
[implicit_task_cb] tid = 5 | parallel_data = 7777777 | task_data = 6660010 | endpoint = 2 | actual_parallelism = 0 | index = 2 | flags = 2
[implicit_task_cb] tid = 3 | parallel_data = 7777777 | task_data = 6660011 | endpoint = 2 | actual_parallelism = 0 | index = 5 | flags = 2
[implicit_task_cb] tid = 9 | parallel_data = 7777777 | task_data = 6660008 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 2
[implicit_task_cb] tid = -1 | parallel_data = 0 | task_data = 0 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 1
[thread_end_cb] tid = -1; WARNING: thread_begin_cb not dispatched; thread_data->value = 0 (supposed to be >= 1)
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = 2 | actual_parallelism = 0 | index = 1 | flags = 1
[thread_end_cb] tid = 1
[thread_end_cb] tid = 9
[thread_end_cb] tid = 5
[thread_end_cb] tid = 7
[thread_end_cb] tid = 4
[thread_end_cb] tid = 3
[thread_end_cb] tid = 8
[thread_end_cb] tid = 6
[thread_end_cb] tid = 2
[my_finalize_tool] tid = 1
```

With that, we, once again, see callbacks coming from an uninitialized thread. 
I will look into the tasking changes next to see if the other issue I've seen is fixed.

https://github.com/llvm/llvm-project/pull/93472


More information about the Openmp-commits mailing list