<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - target nowait doesn't respect taskwait when helper threads are used"
href="https://bugs.llvm.org/show_bug.cgi?id=49816">49816</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>target nowait doesn't respect taskwait when helper threads are used
</td>
</tr>
<tr>
<th>Product</th>
<td>OpenMP
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Runtime Library
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>xw111luoye@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Use <a href="https://github.com/ye-luo/qmcpack/tree/debug-task-race">https://github.com/ye-luo/qmcpack/tree/debug-task-race</a>
```
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DQMC_MPI=0 -D
ENABLE_CUDA=1 -D CUDA_ARCH=sm_70 -D CUDA_HOST_COMPILER=`which gcc` -D
ENABLE_OFFLOAD=ON -D USE_OBJECT_TARGET=ON -DOFFLOAD_ARCH=sm_70 .. ; make -j32
qmcpack
LIBOMPTARGET_DEBUG=1 ctest -R deterministic-diamondC_1x1x1_pp-param-grad
grep "stride 16 add\|Moving 512 bytes" Testing/Temporary/LastTest.log
```
The run should have two host to device transfer and then two device to host
transfer. However with helper threads, the H2D transfer happens when closing
the parallel region.
```
stride 16 addr 0x33c5040 size(bytes) 512
stride 16 addr 0x3416d40 size(bytes) 512
Libomptarget --> Moving 512 bytes (tgt:0x00007f53d5005880) ->
(hst:0x00000000033c5040)
Libomptarget --> Moving 512 bytes (tgt:0x00007f53d5006e80) ->
(hst:0x0000000003416d40)
Libomptarget --> Moving 512 bytes (hst:0x0000000003416d40) ->
(tgt:0x00007f53d5006e80)
Libomptarget --> Moving 512 bytes (hst:0x00000000033c5040) ->
(tgt:0x00007f53d5005880)
```
Then turn off helper threads, the behavior is expected.
```
LIBOMPTARGET_DEBUG=1 LIBOMP_USE_HIDDEN_HELPER_TASK=OFF ctest -R
deterministic-diamondC_1x1x1_pp-param-grad
grep "stride 16 add\|Moving 512 bytes" Testing/Temporary/LastTest.log
stride 16 addr 0x3baf080 size(bytes) 512
Libomptarget --> Moving 512 bytes (hst:0x0000000003baf080) ->
(tgt:0x00007f15b5005880)
stride 16 addr 0x3b74e00 size(bytes) 512
Libomptarget --> Moving 512 bytes (hst:0x0000000003b74e00) ->
(tgt:0x00007f15b5006e80)
Libomptarget --> Moving 512 bytes (tgt:0x00007f15b5005880) ->
(hst:0x0000000003baf080)
Libomptarget --> Moving 512 bytes (tgt:0x00007f15b5006e80) ->
(hst:0x0000000003b74e00)
```
H2D is launched here
<a href="https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L911">https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L911</a>
Multiple taskwait for example:
<a href="https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L916">https://github.com/ye-luo/qmcpack/blob/09ea73cef59b99b1cfb440b9e453536f300274d9/src/QMCWaveFunctions/Fermion/DiracDeterminantBatched.cpp#L916</a>
Note: ignore the ctest reported failure because I modified the test input just
to make the reproducer run easier.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>