<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/64556>64556</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[OMPT] `ompt_callback_sync_region_wait` callback does not follow OpenMP specification
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Thyre
</td>
</tr>
</table>
<pre>
## Introduction
The OpenMP specification offers the OMPT interface for tools which enables profiling and tracing of OpenMP constructs in applications. For barriers, two callbacks exist: `ompt_callback_sync_region` and `ompt_callback_sync_region_wait`. Both callbacks are optional based on the OpenMP specifications and offer similar functionality. However, there's an important difference. Looking at an example in the OpenMP specifications (here for implicit barriers), it states [[Link](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf#page=323), Page 323]:
> A thread dispatches a registered `ompt_callback_sync_region` callback for each implicit barrier _begin_ and _end_ event. Similarly, a thread dispatches a registered `ompt_callback_sync_region_wait` callback for each implicit barrier _wait-begin_ and _wait-end_ event. All callbacks for implicit barrier events execute in the context of the encountering task and have type signature `ompt_callback_sync_region_t`.
Notice the difference between _begin_ / _end_ and _wait-begin_ / _wait-end_. This is the case for all OpenMP constructs dispatching those two callbacks.
The specification is quite vague about OMPT in general, but the events are described in more detail. For implicit barriers, the following events are described:
> The _implicit-barrier-begin_ event occurs in each implicit task at the beginning of an implicit barrier region.
The _implicit-barrier-wait-begin_ event occurs when a task begins an interval of active or passive waiting in an implicit barrier region.
The _implicit-barrier-wait-end_ event occurs when a task ends an interval of active or waiting and resumes execution of an implicit barrier region.
The _implicit-barrier-end_ event occurs in each implicit task after the barrier synchronization on exit from an implicit barrier region.
## Bug description
When implementing an example code, spawning tasks from a single thread and then computing something (here a `sleep` call), we would expect to see `sync_region_wait` callbacks every time the other threads have to wait for new tasks to arrive. The main source code can be seen below:
```c
#include <omp.h>
#include <unistd.h>
int main(void)
{
#pragma omp parallel default(none)
#pragma omp single nowait
{
int num_threads = omp_get_num_threads();
for (int j = 0; j < 4; ++j)
{
for (int i = 0; i < num_threads + 1; ++i)
{
#pragma omp task default(none)
{
usleep(125);
}
}
sleep(3);
}
}
}
```
However, the runtime does not dispatch the `sync_region_wait` correctly. The threads correctly enter `sync_region`, but immediately after also enter `sync_region_wait` with `endpoint = ompt_scope_begin` and dispatch `sync_region_wait` with `endpoint = ompt_scope_end` just before exiting `sync_region_wait` as well, even though tasks are executed, indicating that the threads are in-fact not waiting.
This can be verified by looking at the output of the reproducer, sorted by thread number:
```
==============
Thread ID = 1
[thread_begin_cb] tid = 1 | type = initial
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[parallel_begin_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | requested_parallelism = 2 | codeptr_ra = 0x55bdf590f1cb
[implicit_task_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = implicit
[work_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | work_type = single_executor | count = 1 | codeptr_ra = 0x55bdf590f1f5
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660003 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660004 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660005 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660007 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660008 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660009 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660010 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660011 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660012 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660013 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660014 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[task_create_cb] tid = 1 | encountering_task_data = 6660002 | new_task_data = 6660015 | flags = explicit | has_dependences = 0 | codeptr_ra = 0x55bdf590f266
[work_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = end | work_type = single_executor | count = 1 | codeptr_ra = 0x55bdf590f29b
[sync_region_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = 0x55bdf590f1cb
[sync_region_wait_cb] tid = 1 | parallel_data = 7770001 | task_data = 6660002 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = 0x55bdf590f1cb
[sync_region_wait_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[sync_region_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[implicit_task_cb] tid = 1 | parallel_data = 7777777 | task_data = 6660002 | endpoint = end | actual_parallelism = 2 | index = 0 | flags = implicit
[parallel_end_cb] tid = 1 | parallel_data = 7770001 | encountering_task_data = 6660001 | flags = invoker_runtime_team | codeptr_ra = 0x55bdf590f1cb
[implicit_task_cb] tid = 1 | parallel_data = 0 | task_data = 6660001 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[thread_end_cb] tid = 1
[my_finalize_tool] tid = 1
==============
Thread ID = 2
[thread_begin_cb] tid = 2 | type = worker
[implicit_task_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = implicit
[work_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | work_type = single_other | count = 1 | codeptr_ra = 0x55bdf590f1f5
[work_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660006 | endpoint = end | work_type = single_other | count = 1 | codeptr_ra = 0x55bdf590f1f5
[sync_region_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[sync_region_wait_cb] tid = 2 | parallel_data = 7770001 | task_data = 6660006 | endpoint = begin | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660003
[task_schedule_cb] tid = 2 | prior_task_data = 6660003 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660004
[task_schedule_cb] tid = 2 | prior_task_data = 6660004 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660005
[task_schedule_cb] tid = 2 | prior_task_data = 6660005 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660007
[task_schedule_cb] tid = 2 | prior_task_data = 6660007 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660008
[task_schedule_cb] tid = 2 | prior_task_data = 6660008 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660009
[task_schedule_cb] tid = 2 | prior_task_data = 6660009 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660010
[task_schedule_cb] tid = 2 | prior_task_data = 6660010 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660011
[task_schedule_cb] tid = 2 | prior_task_data = 6660011 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660012
[task_schedule_cb] tid = 2 | prior_task_data = 6660012 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660013
[task_schedule_cb] tid = 2 | prior_task_data = 6660013 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660014
[task_schedule_cb] tid = 2 | prior_task_data = 6660014 | prior_status = complete | next_task_data = 6660006
[task_schedule_cb] tid = 2 | prior_task_data = 6660006 | prior_status = switch | next_task_data = 6660015
[task_schedule_cb] tid = 2 | prior_task_data = 6660015 | prior_status = complete | next_task_data = 6660006
[sync_region_wait_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660006 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[sync_region_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660006 | endpoint = end | kind = barrier_implicit (deprecated) | codeptr_ra = (nil)
[implicit_task_cb] tid = 2 | parallel_data = 7777777 | task_data = 6660006 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = implicit
[thread_end_cb] tid = 2
```
Even though `tid = 2` is inside a `sync_region_wait` for an implicit barrier, we can see that tasks are executed.
## How to reproduce the issue
I was able to reproduce the bug on a system running Ubuntu 22.04 LTS and the following compiler versions:
```console
$ ml LLVM/git
$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 52ac71f92d38f75df5cb88e9c090ac5fd5a71548)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/git/bin
$ ml LLVM/17.0-rc1
$ clang --version
clang version 17.0.0 (https://github.com/llvm/llvm-project.git cff7a7747db02d1214b20e98677e5ddcb402ffe0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/17.0.0-rc1/bin
$ ml LLVM/16.0.6
$ clang --version
clang version 16.0.6 (git@github.com:Thyre/llvm-project.git 7cbf1a2591520c2491aa35339f227775f4d3adf6)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/16.0.6/bin
```
The issue can be reproduced by downloading the attached archive, extracting it and running the following commands:
```console
$ make clean && make CC=clang && make run
```
Link to reproducer: [reproducer.ZIP](https://github.com/llvm/llvm-project/files/12301531/reproducer.ZIP)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzkW1Fv27iT_zTKy8CGRFmS_ZCHpt5gC7S3BTZ3B9yLQVEjmw1FaknKTvbTH0hJtmwraZu63TX-QZHaFDn8zW-GM0OGosbwtUS8DZK7IFne0MZulL592DxrvMlV8XwbkDggMXyQVquiYZYrGYTLIHzX_n7YIPxRo_z0GUyNjJecUdcHVFmiNmDd80-fH4BLi7qkDKFUGqxSwsBuw9kGUNJcoIFaq5ILLtdAZQFWU-Y-q7KXz5Q0VjfMGuASaF2Lbi4zhXulIadac9QmIO_B7hQwKkRO2aMBfOLGBvE7CNJQVbVd9Y9W5lmylca10yoN_cSv9VntKLdBGk7hTtnNYAaqEVTtwFABOTVYgJKt8iPkGD-RZwgMr7igGspGsnY8t89T-F3tcIva67JBjQHJ3CjgVa20pdJCwd14lAyn8FGpR0-cdX3wiVa1QMfSywgCMndyvTV45bjkdkDhws3MLRhLLRrw7nH3kcvHIFm6odbWJojfBeQ-IPe73W6qapRVPVV67RrqCVPSorQBuW9qoWhhAnLfIpm8-_xh8ucQzSSZkGldlAGJa7rGIF7Gzus8hs90jeC-Jks338D1gvg3eAd2o5EWUHBTU8s2aICCM5axqPF1azqL9-2eB6Rsc0YGrHJcc7nyNluhLFaAW5R2Cn-2phPPDif9ESS9X30THNd3MsTkG4bA3gkx8M0xC7c93cpA1ti9q3ijPVm36NxXlEw1bt0637LUPPr5NnSLYJ9rBBc7qG00vq6cXzFDy_2Xspyhn-LgxZCj3SHKPd8Bue_4Pmg5fLRXewoPG26At9GGUdN6NRViJHT05vEqbZTB41gxhUNcOw5o3MBfDbcIW7puEGiuGtuHNlijRE2Fc4S8sS15LcMuMhRomOY5Fq5rpXyLpVy0cWtk9flFD6USQu0c0DFZI6vBgV714iaduJ4zLwMUY432AfTYu1rrtsj9ANkF3zbmHDtPa9fpganzSYfWOpp5t0HpFoubzz9vw5rzsi0VfkZm-RZBaahdetoiOGEOjgv7PwDnsETGwKAsXoHSQ3CuqNE0FfaLp012bwV2jukF05QWdWudTrhbYRutJP-7S7gu8HMLpVbV19F0v9vkftesO7-qT_P7_zqGnCisUHYM7BMMUwU6ZzU13ck-RpgOARgu1wL7sOhzuhPGVFU3XpJRFVq_EPtkRF0gMQKx7oNhlwV2CDvViALwqUZmwSow6MPOa1HUOGb1M1hetdFGuVzaITJdIFPetj5iSNx1KlgFjrUtTv2iqiiXYFSjWas0MCohR4fB_S_U7nQ1pmH7j-2J5pKJpkAI4veqqqebIP5t7FkjubHF8LH_zaX1KAIy3ypeOFrap9ndXkqt6bqioKoaaqqpECigwJI2wgZkLpXEw7Cj3p2lpPIE-g4AAHvZ0P04ELKpVj2BQbx041drtKtBe0DmbqL4ZLRjOCBzJ-SLHxoG8Z3_-B5m7mNA7gJy92UPsh94huNEGj9I417aEUZyB9FBOj-TfjTDMS9-5b3A4Diq_qdpnZjMI5KMcXGYeLkXtzzv0kuJx2Qcjdh_OXzoXXDoRsdlJehG-rVRKDQgld1nR__0pdWltEZmxXO7Nnqi982ALoKejHY4uuTIqwoLTi2K5y6sUWHU6Kj9nDtuN-4ZyqJWzuad69mVYarGtmToK_i9Em8UhrJw3b40xkKOpcvXLrL6QDUukRrYofDp38VyV1g0600XS6gf7-uswlfVsvA1hS9AupTbk-j6cjkpKbPeHl3amR5vubjpA9AWNS85FpA_gzhsAnysa2zd7Es5jbXfv7W2N0rbdlAXnWVT5agPQezEd-Llt_7rEHqhH5ae2agTkty1k3X1HcuDZAmWF20nCLL3bVXpvnLJLadiP7JPZitH6ejQPuKtCmppGxBakW7Avi1N0zAM2xFH5veYfDNltqFi1cvjphrMwmWBT4PvpaBrM4p4j-dlbc8hZ1k2gHcovlcvazFEsFWPqFfdol5ZpJXvovGvBo3F4kwp4p-7jFZbvdIdb09JkhdlsgjLiOU_ZIKhPmMqkDcYgpwYIjylocO5B75T-pfg9fPsPbjNqat24btc5YluulHR14gvk8Oq8XxrpBZH1fiqo7SoJe5Gn8Yn_OFTVze65g01qwJrlIXbopkB4S9jJ2n6q7DPrhh7csXYsyvGPr9i7IvrxR6dxulrwn6aaq8JO7li7Fecm6Irzk3RT81NP7ceQ1lcvBoji0MZPNz-_YKi8pHLVnp3irc_QYSAzAusNTLq95WL7yjlT7ew_4GKuJ-3uNWPqRGQueTicAr3Rnf6t4B_45bwrfAvtSHcg0JZ_KMb85-38f6us4-v8Bt-58lHd8ozQu6-S_W8Krmkgv-NK6uUGOv1Q2dO5FvOnMjxmZPLGKi_iXbyxgiZXuC844z-bzjv-Cl4RzNs-xeWtx92_FzwrxYHPwr95VD-U-j_mZloJJdehQ4egGEbLBpxWhh3Cmiu9GjUTgfPjaW2aZeX2XH_pwRfLT_Z8SO0iwCIxwEwVdUCLb4OIf2nOZhdBMDsqjlILgIguWoOsosAyK6ag_lFAMyvmoPLBOXFNXMQhZcA0B2cXisH0UU4iK6aA3IRDshVc3CRGim66hopukiNFF11jRRdpEaKLlQjvX3P8-oZ1is7z193fngF4N94xPJW-G893zo9YHnxgIu8cvPut8G9sCAND0PSELgBLg0v-guwIzfM_IX280u93d1YRqW_D9veJzu7dTZ25_d3tQOrDvfC_C0xbkyDw84fYEcN0Fzgeee8WYOSQME8G4sV6Ka9NP7feSNtA4RMwxl8fPizv_o7uM_u1ikXqGGL2nAlzfnNM6akUaIHQ2ZQCfj48X8-BeR-vbcFmQETVK5hMukktQ_axq4Jovk0nIZw9trKmttNk0-ZqgJyL8S2_29Sa_UFmZ2uuYWEUJZF5YIU8bzMkqJMWD6f44KFi5CypCwSmkXJbL736weq1-jfNnqap6t0Nmnko1Q7ORFcNk-TtWyODiwrVaBwvWtl-FPHujTWuWix5Nq_tkTuVW0Dcm9UaXdU4_HHAS3kPudyjLMom4YTzaLvIy77AeJYWWY0y2ZZkYekiEg0y0mIi3maZZgUBctnISlLDP954lo1PTsv85dOw2n6fez5IY49Z5lZOGAtfuffsBujLWN5GVGSLKKEhIzMFhGlcRLHi5KQLMuSclbEtCjTfwFtLSVDysYC30MfV_o7q_so4i-gFmonhaJFex0WgVpLXU0AVLMN3_r3C_DJasraV0Bs-wpGF2vOwkpFZfFN0YQ-IjCBVEJA0oCkbcv790G8bM04bNbNuIYfuXw8CowtfcndoWH6fx8-j70z97U1FJD7kgs0jmgSh1ESO-88kdv7wE1xGxeLeEFv8DZKF1GSLbIkvNnc5iSJszDEME7IbJYkJA0XjISzWcxKls6LG35LQhKH83ARRVGShNNFMp_jLKWknM-yBGkwC7GiXEwduKnS6xtvzNt0liTpjaA5CuPfHiVE4q7LIIQEyfJG33qF8mZtglkouLHmIMVyK_xrp398-vzgkuj3vR-3v7HeWn_0JcebRovb72bda-Bo9xr-fwAAAP__hQpwQQ">