<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/78425>78425</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [OMPT] Runtime dispatches implicit tasks for OpenMP teams which is against specifications
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Thyre
      </td>
    </tr>
</table>

<pre>
    [OMPT] Runtime dispatches implicit tasks for OpenMP teams which is against specifications
====

# Description

While implementing support for OpenMP teams in our instrumentation infrastructure Score-P, we noticed that the teams directive seemingly dispatches the wrong number of callbacks, adding `parallel_begin` and `implicit_task` callbacks after the team creation. In addition, these callbacks dispatch with the wrong argument for `actual_parallelism` and no `codeptr_ra` attached. 

We believe that this behavior is not conform with the OpenMP specifications. In OpenMP 5.2 and TR12, the following sections describe how a teams directive should be dispatched:

**10.2: teams Construct**

> A thread dispatches a registered **ompt_callback_parallel_begin callback** for each occurrence of a _teams-begin event_ in that thread. The callback occurs in the task that encounters the **teams** construct. This callback has the type signature **ompt_callback_parallel_begin_t**. In the dispatched callback, (flags & **ompt_parallel_league**) evaluates to true. A thread dispatches a registered **ompt_callback_implicit_task** callback with **ompt_scope_begin** as its _endpoint_ argument for each occurrence of an _initial-task-begin_ in that thread. Similarly, a thread dispatches a registered **ompt_callback_implicit_task** callback with **ompt_scope_end** as its endpoint argument for each occurrence of an _initial-task-end event_ in that thread. The callbacks occur in the context of the initial task and have type signature **ompt_callback_implicit_task_t**. In the dispatched callback, (flags & **ompt_task_initial**) evaluates to true. A thread dispatches a registered **ompt_callback_parallel_end** callback for each occurrence of a _teams-end event_ in that thread. The callback occurs in the task that encounters the teams construct. This callback has the type signature **ompt_callback_parallel_end_t**.

**12.8: Initial Task**
> No events are associated with the implicit parallel region in each initial thread.

The teams section does not mention any implicit tasks being dispatched in addition to the initial task. Only a initial task for the initial thread of that region should be dispatched. If a parallel region is encountered, we _should_ see those callbacks and we do, but there's simply one layer of callbacks too much. 
The issue exists in LLVM itself, oneAPI and ROCm releases. We were able to confirm that at least LLVM 17 & trunk, ROCm 5.7.1 & 6.0 and oneAPI 2024.0 are affected.

# Reproducer

The issue can be reproduced with the following source code snippet. A basic OMPT tool with the following callbacks is created, which tracks the amount of `initial_task` and `implicit_task` callbacks.
For testing, we create a simple teams region with two teams where the number of created teams is counted.

```c
#include <assert.h>
#include <omp.h>
#include <omp-tools.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stdio.h>

static ompt_finalize_tool_t ompt_finalize_tool;
_Thread_local int           ompt_tool_tid = -1;
atomic_int_least32_t        num_implicit_tasks = 0;
atomic_int_least32_t        num_initial_tasks = 0;
atomic_int_least32_t        teams_created = 0;

static const char*
scope_endpoint2string( ompt_scope_endpoint_t t )
{
    switch ( t )
    {
        case ompt_scope_begin:
            return "begin";
        case ompt_scope_end:
            return "end";
 case ompt_scope_beginend:
            return "beginend";
 }
    return "";
}

static const char*
parallel_flag2string( ompt_parallel_flag_t t )
{
    if ( t & ompt_parallel_invoker_program )
    {
        if ( t & ompt_parallel_league )
        {
 return "program_league";
        }
        if ( t & ompt_parallel_team )
        {
            return "program_team";
 }
    }
    if ( t & ompt_parallel_invoker_runtime )
 {
        if ( t & ompt_parallel_league )
        {
 return "runtime_league";
        }
        if ( t & ompt_parallel_team )
        {
            return "runtime_team";
 }
    }
    return "";
}

static const char*
task_flag2string( ompt_task_flag_t t )
{
    if ( t & ompt_task_initial )
    {
        return "initial";
    }
 if ( t & ompt_task_implicit )
    {
        if ( t & ompt_task_undeferred )
        {
            return "implicit_undeferred";
        }
        if ( t & ompt_task_untied )
        {
            return "implicit_untied";
        }
 if ( t & ompt_task_final )
        {
            return "implicit_final";
        }
        if ( t & ompt_task_mergeable )
        {
            return "implicit_mergeable";
        }
 if ( t & ompt_task_merged )
        {
            return "implicit_merged";
        }
        return "implicit";
 }
    if ( t & ompt_task_explicit )
    {
        if ( t & ompt_task_undeferred )
        {
            return "explicit_undeferred";
        }
        if ( t & ompt_task_untied )
        {
            return "explicit_untied";
        }
 if ( t & ompt_task_final )
        {
            return "explicit_final";
        }
        if ( t & ompt_task_mergeable )
        {
            return "explicit_mergeable";
        }
 if ( t & ompt_task_merged )
        {
            return "explicit_merged";
        }
        return "explicit";
 }
    if ( t & ompt_task_target )
    {
        if ( t & ompt_task_undeferred )
        {
            return "target_undeferred";
        }
        if ( t & ompt_task_untied )
        {
            return "target_untied";
        }
 if ( t & ompt_task_final )
        {
            return "target_final";
        }
        if ( t & ompt_task_mergeable )
 {
            return "target_mergeable";
        }
 if ( t & ompt_task_merged )
        {
            return "target_merged";
        }
        return "target";
    }
 if ( t & ompt_task_taskwait )
    {
        if ( t & ompt_task_undeferred )
        {
            return "taskwait_undeferred";
        }
        if ( t & ompt_task_untied )
        {
            return "taskwait_untied";
        }
 if ( t & ompt_task_final )
        {
            return "taskwait_final";
        }
        if ( t & ompt_task_mergeable )
        {
            return "taskwait_mergeable";
        }
 if ( t & ompt_task_merged )
        {
            return "taskwait_merged";
        }
        return "taskwait";
 }
    return "";
}

void
on_ompt_parallel_begin( ompt_data_t*        encountering_task_data,
                        const ompt_frame_t* encountering_task_frame,
 ompt_data_t*        parallel_data,
                        unsigned int requested_parallelism,
                        int flags,
                        const void*         codeptr_ra )
{
 printf( "[%s] thread_id = %d | flags = %s\n", __FUNCTION__, ompt_tool_tid, parallel_flag2string( flags ) );
}

void
on_ompt_parallel_end( ompt_data_t* parallel_data,
 ompt_data_t* encountering_task_data,
 int          flags,
                      const void*  codeptr_ra )
{
    printf( "[%s] thread_id = %d | flags = %s\n", __FUNCTION__, ompt_tool_tid, parallel_flag2string( flags ) );
}

void
on_ompt_implicit_task( ompt_scope_endpoint_t endpoint,
                       ompt_data_t*          parallel_data,
 ompt_data_t*          task_data,
 unsigned int          actual_parallelism,
 unsigned int          index, /* thread or team num */
 int                   flags )
{
    printf( "[%s] thread_id = %d | endpoint = %s | actual_parallelism = %u | index = %u | flags = %s\n", __FUNCTION__, ompt_tool_tid, scope_endpoint2string( endpoint ), actual_parallelism, index, task_flag2string( flags ) );
    if( flags & ompt_task_initial )
    {
        atomic_fetch_add( &num_initial_tasks, 1 );
    }
    else if (flags & ompt_task_implicit )
 {
        atomic_fetch_add( &num_implicit_tasks, 1 );
 }
}

void
on_ompt_thread_begin( ompt_thread_t thread_type,
                      ompt_data_t*  thread_data )
{
    assert( ompt_tool_tid == -1 );
    static atomic_int_least32_t thread_counter = 1; // ompt_tool_tid >= 1
    ompt_tool_tid      = atomic_fetch_add( &thread_counter, 1 );
}

static int
initialize_tool( ompt_function_lookup_t lookup,
                 int initialDeviceNum,
                 ompt_data_t*           toolData )
{
    printf( "[%s] initial_device_num=%d\n",
 __FUNCTION__, initialDeviceNum );

    ompt_set_callback_t set_callback =
        ( ompt_set_callback_t )lookup( "ompt_set_callback" );
 assert( set_callback != 0 );
    ompt_finalize_tool =
        ( ompt_finalize_tool_t )lookup( "ompt_finalize_tool" );
    assert( ompt_finalize_tool != 0 );

    set_callback( ompt_callback_parallel_begin, (ompt_callback_t) &on_ompt_parallel_begin );
    set_callback( ompt_callback_parallel_end, (ompt_callback_t) &on_ompt_parallel_end );
    set_callback( ompt_callback_implicit_task, (ompt_callback_t) &on_ompt_implicit_task );
    set_callback( ompt_callback_thread_begin, (ompt_callback_t) &on_ompt_thread_begin );
 return 1; /* non-zero indicates success */
}

static void
finalize_tool( ompt_data_t* toolData )
{
    printf( "[%s]\n", __FUNCTION__ );
}

/* Called by the OpenMP runtime. Everything starts from here. */
ompt_start_tool_result_t*
ompt_start_tool( unsigned int omp_version, /* == _OPENMP */
                 const char*  runtime_version )
{
 printf( "[%s] omp_version=\"%d\"; runtime_version=\"%s\"\n",
 __FUNCTION__, omp_version, runtime_version );
    static ompt_start_tool_result_t tool = { &initialize_tool,
 &finalize_tool,
 ompt_data_none };
    return &tool;
}

void
summary()
{
    int fetched_initial_tasks = atomic_load( &num_initial_tasks );
    int fetched_implicit_tasks = atomic_load( &num_implicit_tasks );
    int fetched_created_teams = atomic_load( &teams_created );
    printf( "------------------------------\n");
    printf( "Expected: initial_tasks encountered = %u | implicit_tasks encountered = %u\n", ( fetched_created_teams + 1 ) * 2, 0 );
    printf( "Actual: initial_tasks encountered = %u | implicit_tasks encountered = %u\n", fetched_initial_tasks, fetched_implicit_tasks );
    printf( "------------------------------\n");
    assert( fetched_initial_tasks == ( fetched_created_teams + 1 ) * 2 );
    assert( fetched_implicit_tasks == 0 );
}

int main( void )
{
    #pragma omp teams num_teams(2) 
    {
        atomic_fetch_add( &teams_created, 1);
 }
    ompt_finalize_tool();
    summary();
}
```

Running the tool, we can see the following output:
```console
$ clang --version
clang version 18.0.0git (https://github.com/llvm/llvm-project.git a762cc21556bbd90ae7d9ee13c33213501195f64)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/apps/software/Clang/trunk/bin
$ clang -fopenmp test.c
$ ./a.out
[ompt_start_tool] omp_version="201611"; runtime_version="LLVM OMP version: 5.0.20140926"
[initialize_tool] initial_device_num=0
[on_ompt_implicit_task] thread_id = 1 | endpoint = begin | actual_parallelism = 1 | index = 1 | flags = initial
[on_ompt_parallel_begin] thread_id = 1 | flags = runtime_league
[on_ompt_implicit_task] thread_id = 1 | endpoint = begin | actual_parallelism = 2 | index = 0 | flags = initial
[on_ompt_parallel_begin] thread_id = 1 | flags = runtime_team
[on_ompt_implicit_task] thread_id = 2 | endpoint = begin | actual_parallelism = 2 | index = 1 | flags = initial
[on_ompt_parallel_begin] thread_id = 2 | flags = runtime_team
[on_ompt_implicit_task] thread_id = 1 | endpoint = begin | actual_parallelism = 10 | index = 0 | flags = implicit
[on_ompt_implicit_task] thread_id = 1 | endpoint = end | actual_parallelism = 10 | index = 0 | flags = implicit
[on_ompt_parallel_end] thread_id = 1 | flags = runtime_team
[on_ompt_implicit_task] thread_id = 2 | endpoint = begin | actual_parallelism = 10 | index = 0 | flags = implicit
[on_ompt_implicit_task] thread_id = 2 | endpoint = end | actual_parallelism = 10 | index = 0 | flags = implicit
[on_ompt_parallel_end] thread_id = 2 | flags = runtime_team
[on_ompt_implicit_task] thread_id = 1 | endpoint = end | actual_parallelism = 0 | index = 0 | flags = initial
[on_ompt_parallel_end] thread_id = 1 | flags = runtime_league
[on_ompt_implicit_task] thread_id = 1 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[on_ompt_implicit_task] thread_id = 2 | endpoint = end | actual_parallelism = 0 | index = 1 | flags = initial
[finalize_tool]
------------------------------
Expected: initial_tasks encountered = 6 | implicit_tasks encountered = 0
Actual: initial_tasks encountered = 6 | implicit_tasks encountered = 4
------------------------------
a.out: test.c:263: void summary(): Assertion `fetched_implicit_tasks == 0' failed.
[1]    1181685 IOT instruction (core dumped)  ./a.out
```

As described in the issue, we see implicit tasks after the creation of initial tasks. This is against the specification. In addition, the `actual_parallelism` argument for the incorrectly dispatched callback seems to be plainly wrong since both teams use only a single thread until we encounter a parallel directive. We would expect to see `actual_parallelism = 1`. 
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUXFdv5DCS_jXyC-GGxHanBz947DEwwE7ArO_2UWBL1S3eSKSOpBz21x8YFKjQwWF82zA8boms-LFYxTBESrpnANfB4kuwuLsglcq4uH7IXgRcbHn6ol_8_P7rIVjcod8VU7QAlFJZEpVkIBEtypwmVCFF5B-JdlygnyWw77-QAlJI9JTRJENUIrInlEmFZAkJ3dGEKMqZDMK7ILwJ5nftj31if-M5ugOZCFrq1t1X_8poDoY7FMAUZXskq7LkQg1loAzxSiDNXlS6teGNKNsJoh8lqhKA_plwAZe_AnyLngAxrmgCKVIZUUhl4EilVECi6CMgCVBQts9futbQDZ8EZ3vEqmILAvEdSkieb0nyR2rKJE21qMEyLIkgeQ55vIU9ZcEyRISl-kVt0VhbVD9vCCCyUyAaaVAiwGgyQ9-YoWyMhG91CwmdfrWE6ImqrCMkEXtjD2OyYBmSRFUkj2vRqCxquRjX7xOeQqlELIh5rhRJMkhnyHMMoC3kFB6hth2VaAsZeaRcaCAwrlDC2Y6LopXH-csHh9HLvVnMsBHk4XeEnYpox_OcPxnXa6dwJlFq0LIFlPEnRIZOy3iVp2jbgXAazG98zOmfKJzhYH7jCNxyZnFiX3rN51_RDVKZAJJ2kUCQgD2VCgSkyHbjRani2imx7__GWbapcQiQJEM8SSohgCWgsURQbCS6tJ3gEZiKNb6dqbUUM_SQtc63BKRtA2aU2sbAEl4xBcKi1vI1xJ0ISa20JkhlSzEjtot6KQHp6EHM-DmuZOzsZ_yqKbRO6Oh_iwK83uVkL1GAl12yDbUcyL4C5wy8QfBI8oooPQA5UqKC2et84g89Z4ZabYPVTi-Z8BLc6LVNiURUSRQDS0tOtWe8ATbmT4ZiyqiiJL_UPK1fhx79Jy1oTkT-YmLIX1INWOorVut1vlrA0lPAKi2dGqwJZwqelaaovzqKFsM6FmTk8SQUerq_FYSGhhPl_RHYQLw1fuOlYzHhRCOfGhFs6HvXKAAsbcw_EnPxbK1j7jfn54cGqW2o_cGtihIRAYhIyRNKFKTtTNIkJDVXY3Iz3VvrNTCytunK8dCo7SYUlHKwM5ZJMjhDhL30c54t6CmoAyPazsYGDj3sztBPlr8g4gNae9dracFjsE9UrcTYBDZD3zQKBvrK1qOQurQmtgRinb4glXEvTdBj6glQynXjbWXyHgEBXkkktc4viDNAOXnpJTZIcY6KKsnqTEDbkUpZAYJnKpXB2j_-8d_fdRSBfKfJcwY3v74Zlr9_3hZIQA5EgpyhfwF6Au3cbQ7afDpXoKKwZiAK6XbKkotWZnQqUTEzZg2lxWw1i8zz5cwmL44XDvGVfqJp73aQKEhn_WzzN5SCp1UCoo8Lq09CmDa-qJt1gNfJRnglEh2-UkCS0bIEpcPBlkiaIJ1La4PlYz1bk-qhptM75zmTRSthrZ0BIoV2rPaCzhgtZJqE8Wgi6dS-15ADqZNnBw_LEhHr73owOEBZeZ94k9lrL2lhOrmuFbnOu3X40Pjr2XkZ2p-ksTtlSV6lgIL5LZEShJplwfzr2GtelIfeXWrLyskWUqVE8YImh1psOc8Pvae8-9b8lrqkSJAJezvKSE7_DbEWJVYjD4P5F9stfjCDPM55QnKkZ9b2YycbQ4KmKJjfocuo6WiViHWOYYbDHMdNX1YV_ownTe_w9M4dOJ3T1zg9riHg9_PMZOYUlGRENLG9yThMgoGlEgaUa-TnIzatUkihAG8c5ZVjoUWQT1SXObpjp4l-4zXTn4RIQINErq4FOo5AAlQlGAowdskebtSaIqZn78OkzPzeITQqznEyTbMurWB11_Zp23bbNE2OOaaZu3Uy1PeL9_KAW-iuccmy15OyR_4HRFwKvhekOOK0A4RsTeD392m0lnDMmjpi6FDPhEc4m1r8AN9Rx9Ui6M7TzvO-nGBF4RZoWmk-yoSO02easBbhDBO-cTCY3H9sIDQvzhoE3VLiCPBbwZvSw7d4q-YUnzppPXeEmd4VS2EHwpYt57mpmYtaGq-Ei5NE0bdJofsflmCCtZnEX8_ZdH-L6gWIPZi0-NUyNCReZQDT-w22t_1PMsGw9_QQn5AWnj8J8TXjz0V8R4q_jPiG8ycivpHhUxDvcz8X8XXvsxGviNjD38e7Zfu5aG9k-MtYd3w_BOmnsf4UgHd5nwtv2_fsBEb_eiKfEM5rxp8N8EaKvw5xx_kTw3kjwyehvcv9fLzb3m-s0R85Te1fnMV--eTWJVwxkhJFzDJ7LUqzDEzZ3tpBNwnw7VDn7scWQHb5ShBdaGmaQ2LmZUttXIZG1pNYV8ycSkjNmpiA_61AKki9jeljJHRPs4Fzop7GvK28qN3rHqvnSkGZ2mmTa48tvgR4IYPFnVu1j92KXYAXKQpWt8jtJNlHMljcmlUkfIvi-P6_ftw-fPv5I47Nsnh31U8_mFp_qfemNka68yFj1owGgJnwUq_VMTx5C5mn-KDngcOm12D6D7V-bxN2anWz_vMYcsdH2uRYm2o-4kFv_DWfkeMhR3pQlsKz3U291zzrTS1hD7CwqrAbhvdjyPEh9D5QaLawazSYp0PF6veVeW_U8B-9HlOTS92tbHhj9vnHzN2adGwxahyXxhO77vvTF6CCcOPW_HegkiwmaWqNvRxsFWihogFjb6KDXIKbm0dFGa5RnSqIt-ExIkk7No8NUoccf051D-sd9Vi9lO2UF4SbINyMj0nXXj-ZQLDd8Go5dbZ87K7PwKRugXJ0L8bxcwHaADQK5l_sALwfsPhqGzSk_ffmo1tMWN5nNrT6-MKqjmzmu0NPsy1Wm2BXMbP7Huec_6nKWCH7x3Q81KPGUbuDR5rAj-pAhjAVCM227N20p6ZiTT0KUsM6ZlWhfYcXaRsWHJF-cOjL7FvP94qEzpEKhbpfUXNustaknVz8XgHe1LY0WgzaBBj7eGvh6XPEkdnhG4BzuN15QLr-XumoeP7eaV--4QjqMR8TtDOWPN0dhYnza-5YkN9E2Wi7HE_Kh2P3NH4mPTuHG7D0TF69ZOQEZl6PM7n5UfUEZt0OPi9XLbVh7QYxzi7_DYLr2ZEm5hyWrJIEpPTyi_F41E4BPaQN0uPXxYeJ5OBQpHRq3Wrfpmj70j0j67a9ZujrI4gXlZkTJ4oIJdFO8AJlIGDmqW0HuW5ix5kAWeXKnsQaa6A18PI5XpTxIwjpjhc76dwEFf_89fXH919-Itf_eJtpqNahJnpOhdWVZa5ta0xrYq0tnvvEu62k--tYXO4pPCbuyIQ8ZWdUR0GdzWiMDya-Wo4AL3sYHCbvjDMw6UxXgmYFYekdMJlMdmRVFES8BHg9sVmpq2cwx8tGzoO4dCDnZDIZHCagXYrD4ykTJHsND9B0507sWcgJkr0TKj1qHuIuD34aAB2g8PW5NGfMgvkN8k3TOZbXqzN8dcfadYKJSejHtcdfbCamByUyjYcTtSfsjak1PkzUUSh5Lw47-u2uaXOESVhbuU-06YEkZBrlI4mIP0A1oAti6w49WCcmmgDPS0H2BdFhwR2208PFHd9fYyPmOcWcNzJMIj9ePY1neC6MeCHRCzBDdeszgF3tf1eM6dnMHCu20c8cSiTMHVftHpXklSor1V7eaA4VciZ5DvU8eoWSnLA9urys47l5YR_W8Txaz8JZuDeF5zpTqpSarqmY9lRl1XaW8CLA93n-WP9zWQr-P5Come5FVkucJDhaLJbbbboJCazSDUA0T-ZzHM0XYRRtFrvlVePNB7sFMr9Bz-tlvLy6rNgfxp_YZU5Z9Xy5Z5VrZ9dLCp6CGZwll_TZvvrGpDLpwR0V-lWA73mpAnxPylIG-F7ynXoiAgJ8f6uVDfC9Oyd7v6VsYJ4dL4EZMEk1S9rXM01xxitXsgWLL_1kYTgnY4zDaBlF0_Mxxub87s_vv1Dz9AYtZuEMh9FVuMFL3bfm2J8up0qusJVxdMFtsCwUDdeEXL45vSgU9VaEot5yUH0-py9Kr5SYkKWl0ztg9Rc0wz3Nwg_WzJzbOlMv_A56vZfH8Dvq9SokhsccVp-cebswprz8CFG8ivf_HXQ-zMQjwnySiT8YxUe0OqrUsbF5FmzeLZaeqdWpEefdQXOuIH5Wt3DJ2pGE27Q5vdpZnlI_uKn81LLkJJpXp2tjMx5z_9ekQ_MbvJzr7yYp7-W2N-jG5P5mZWAZHkn_A7xCO0Lz9lLM4kukfY0QiqJ1tFwv0LefD-62ur2FFuB1wgWgtCpKnZ5v0CAvG8unb9r70Gl93c9cY3KJtU6qe3fZ2hvm9eVyxHfeRTXpLgJ2rvTr5t7N7bEL6Qful3dvk9orcAkXAhLVvVzf3s40V-_NRcstoDInlOUv7kK7pCwBtOUqc1VRJQFxe9tOUrbPod6C1BEh10ZoUNK9QtfcGLfX0cydOzAI12y13Ua1cdsty3CGLtLrebqZb8gFXEercHG13izXm4vsejdfhmSFr1Y4XK3X0TzckgWOFuQqIhGJFrsLeo1DfBVG0SqKwnW4mW3C1XoXbXZbDCu83ZHgKoSC0Hym65AZF_sL49Tr1foKLy5ysoVcmv_OAWMGT7XHcbC4uxDXpnbZVnsZXIU5lUq2VBRVOXzsf_dwUYn8-rwCK8D3RgNd0xgN_y8AAP__1qY26Q">