<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/76998>76998</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [OpenMP] why processing the task deque of a thread in reverse order?
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ye-luo
      </td>
    </tr>
</table>

<pre>
    Reproducer main2.cpp
```
#include <cstdio>
#include <omp.h>

int main()
{
  #pragma omp parallel for
  for(int i=0; i<2; i++)
  {
    //printf("process i = %d, tid = %d\n", i, omp_get_thread_num());
    #pragma omp task
 printf("running i = %d, j = %d, tid = %d\n", i, 0, omp_get_thread_num());
    #pragma omp task
    printf("running i = %d, j = %d, tid = %d\n", i, 1, omp_get_thread_num());
    #pragma omp task
    printf("running i = %d, j = %d, tid = %d\n", i, 2, omp_get_thread_num());
    #pragma omp taskwait
 }
}
```
run
```
$ OMP_NUM_THREADS=2 ./a.out 
running i = 0, j = 2, tid = 0
running i = 0, j = 1, tid = 0
running i = 0, j = 0, tid = 0
running i = 1, j = 0, tid = 0
running i = 1, j = 1, tid = 0
running i = 1, j = 2, tid = 1
```
thread 0 first processes its own task deque in reverse order. Then it steals tasks from thread 1 task deque in natural order.
I feel it is counterintuitive to process tasks in reverse order and having two different processing orders complicate performance tuning.

When writing offload code, it is better to submit heavy kernels before smaller kernels given there is no dependency among them. This way hides more kernel launch latency (host activity).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVU1v4zYQ_TXUZRCBovylgw5JtEZ7SHex3aJHgxZHFncpUiWHNvzvC8pK7AZZpB97KCBIFGfmzXvkDClD0AeLWLPlA1s2mYzUO1-f8c5El-2dOtefcfROxRY9DFJbkbfjyHjD-D1b8fm5_IpS29ZEhcDKxzaQ0o6VH94yumHM-6ttemtLUwImNkxUs2X9cBkAMFGOXh4GCW4YYZReGoMGOuefPdJQbBKMZmXDWfmQBo_iMhAP01O94F2hE_iWie3otaVuyi9G71oMATSwsgEmloqJRyCtrv_LR8uESNM6vdww7g5IO-o9SrWzcZiViIqVf8l1K4Rk-DbbbtP7aK22h1fpv_5tMvyHMAL4oaSK_yMp8d9InaSm2c7WzXPVNm82iI_2O42zgI9Pn3a__Pa0-_LT5w_3za-sbATkTGxl7iLBS_yNVH7VKW5F8neci3_izN91Lv6l8_s0iu8ILN5cxMvWAYdO-0Aw9y8G0BTAney0W6Dwj4igLXg8og8Iziv0OXzp0YImCITShMk3QOfdADNu8SreSopemjn-QuFn6BBNgtEBWhctYSrUqEkfEcg9k5rhX7MAaRX08phWgE4OlO469GhfxCTD5JnQh9HoVhLCiL5zfpC2RaCY1i-_PVZ_T8pOXtMU3XXGSQWtUzh1wER1j0ToE8EQ94Mm6FEez_ANvUWTzJ3zCGFIJ65_mT7oI1qgHj0mEOtA4YhWoW3PIAeXVPQ4pLXVAU7yDL1WGGBIYBcQMDLatgcjaYpiYtO7QCBb0kdNZyaqPFN1qaqykhnWxZovFpyv-Sbr633RrYpyVXQtx06qtsQ1cty31WJdFCvcZ7oWXCx4wReClxuxyduqKFfrrmzLzXojkLMFx0FqkxtzHHLnD5kOIWK9XlXVJjNyjyZMF6MQFk8wGdPRsWwyX6eYu308BLbgRgcKVxTSZKYb9eOI9ukTWzZw6s-3u0g93taT60A-F9rrqmDlNove1D3RGFh5f7mqDpr6uM9bNzCxTYnnz93o3VdsiYntRDcwsZ3k_BkAAP__jllaEQ">