<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hal, seems to me, not everything is protected. Some buffers are
reused for different kernels, I assume. Better to ask Alex
Eichenberger, he knows more about it, I did not not investigate
this problem.<br>
</p>
<p>As to clang, we try to reduce the size of the buffers in the
global memory for the reduction/lastprivate/etc. vars, which may
escape their declaration context. These buffers cannot be combined
in streams mode, need to allocate unique buffer for each
particular kernel. It is not very hard to do, it is just not
implemented yet.<br>
</p>
<pre class="moz-signature" cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div class="moz-cite-prefix">30.10.2019 3:22 PM, Finkel, Hal J.
пишет:<br>
</div>
<blockquote type="cite"
cite="mid:ee8eb27e-db7d-52a4-9d7b-c58b4f49b5e1@anl.gov">
<pre class="moz-quote-pre" wrap="">On 10/30/19 1:48 PM, GMail wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
I don't think it will be very easy. It requires some additional work
in libomptarget + some fixes in the clang itself. Otherwise there
might be some race conditions.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Can you be more specific? I thought that the mapping table, etc. were
already appropriately protected.
As a general thought, we should probably have a mode in which the
runtime is compiled with ThreadSanitizer to check for these kinds of things.
Thanks again,
Hal
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">-------------
Best regards,
Alexey Bataev
30.10.2019 2:40 PM, Finkel, Hal J. via cfe-dev пишет:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">[+Ye, Johannes]
I recall that we've also observed this behavior. Ye, Johannes, we had a
work-around and a patch, correct?
-Hal
On 10/30/19 12:28 PM, Alessandro Gabbana via cfe-dev wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Dear All,
I'm using clang 9.0.0 to compile a code which offloads sections of a
code on a GPU using the openmp target construct.
I also use the nowait clause to overlap the execution of certain
kernels and/or host<->device memory transfers.
However, using the nvidia profiler I've noticed that when I compile
the code with clang only one cuda stream is active,
and therefore the execution gets serialized. On the other hand, when
compiling with XLC I see that kernels are executed
on different streams. I could not understand if this is the expected
behavior (e.g. the nowait clause is currently not supported),
or if I'm missing something. I'm using a NVIDIA Tesla P100 GPU and
compiling with the following options:
-target x86_64-pc-linux-gnu -fopenmp
-fopenmp-targets=nvptx64-nvidia-cuda
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_60
best wishes
Alessandro
_______________________________________________
cfe-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
</blockquote>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
</blockquote>
</body>
</html>