<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - target nowait doesn't respect taskwait when helper threads are used"
   href="https://bugs.llvm.org/show_bug.cgi?id=49816">49816</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>target nowait doesn't respect taskwait when helper threads are used
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>OpenMP
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Runtime Library
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>xw111luoye@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Use <a href="https://github.com/ye-luo/qmcpack/tree/debug-task-race">https://github.com/ye-luo/qmcpack/tree/debug-task-race</a>

```
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DQMC_MPI=0 -D
ENABLE_CUDA=1 -D CUDA_ARCH=sm_70 -D CUDA_HOST_COMPILER=`which gcc` -D
ENABLE_OFFLOAD=ON -D USE_OBJECT_TARGET=ON -DOFFLOAD_ARCH=sm_70 .. ; make -j32
qmcpack
LIBOMPTARGET_DEBUG=1 ctest -R deterministic-diamondC_1x1x1_pp-param-grad
grep "stride 16 add\|Moving 512 bytes" Testing/Temporary/LastTest.log
```

The run should have two host to device transfer and then two device to host
transfer. However with helper threads, the H2D transfer happens when closing
the parallel region.
```
stride 16 addr 0x33c5040 size(bytes) 512
stride 16 addr 0x3416d40 size(bytes) 512

Libomptarget --> Moving 512 bytes (tgt:0x00007f53d5005880) ->
(hst:0x00000000033c5040)
Libomptarget --> Moving 512 bytes (tgt:0x00007f53d5006e80) ->
(hst:0x0000000003416d40)
Libomptarget --> Moving 512 bytes (hst:0x0000000003416d40) ->
(tgt:0x00007f53d5006e80)
Libomptarget --> Moving 512 bytes (hst:0x00000000033c5040) ->
(tgt:0x00007f53d5005880)
```

Then turn off helper threads, the behavior is expected.
```
LIBOMPTARGET_DEBUG=1 LIBOMP_USE_HIDDEN_HELPER_TASK=OFF ctest -R
deterministic-diamondC_1x1x1_pp-param-grad
grep "stride 16 add\|Moving 512 bytes" Testing/Temporary/LastTest.log
stride 16 addr 0x3baf080 size(bytes) 512
Libomptarget --> Moving 512 bytes (hst:0x0000000003baf080) ->
(tgt:0x00007f15b5005880)
stride 16 addr 0x3b74e00 size(bytes) 512
Libomptarget --> Moving 512 bytes (hst:0x0000000003b74e00) ->
(tgt:0x00007f15b5006e80)

Libomptarget --> Moving 512 bytes (tgt:0x00007f15b5005880) ->
(hst:0x0000000003baf080)
Libomptarget --> Moving 512 bytes (tgt:0x00007f15b5006e80) ->
(hst:0x0000000003b74e00)
```

H2D is launched here
<a href="https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L911">https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L911</a>
Multiple taskwait for example:
<a href="https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L916">https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L916</a>

Note: ignore the ctest reported failure because I modified the test input just
to make the reproducer run easier.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>