<div dir="ltr"><div>Could you add documentation for this option?</div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">fopenmp-cuda-parallel-target-region</tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">The name doesn't reflect the actual effect.</tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">The differential review mentions</tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">```<br></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">Added support for dynamic memory allocation for globalized variables in<br>
case if execution of target regions in parallel is required.</tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">```</tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">So what are the globalized variables? When do they occur? since it impacts performance, application developers desire to know more details.<br></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced">```<br></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment"><p>It
might use "slow" allocation functions, in general, since it may use
malloc on the device side. Being disabled, it uses statically
preallocated memory, which might be faster, if parallel target regions
are not required.</p><p>```</p>In the slow code path, does it always call malloc in the device or malloc gets called when exceeding a size limit? Is the call explicitly generated by clang or the cuda driver handles it?<br></span></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment">I saw some <span class="gmail-c1">UseSharedMemory flags in the review. Is the slow code path uses CUDA shared memory and it may need global memory when the size limit is exceeded?<br></span></span></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment">In the fast code path, what is the scope of this statically
preallocated memory? Is it owned by the CUDA plugin?</span></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment"><br></span></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment">Thanks,<br></span></tt></span></div><div><span class="gmail-transaction-comment"><tt class="gmail-remarkup-monospaced"><span class="gmail-transaction-comment">Ye<br></span></tt></span></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>
Argonne National Laboratory</div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 9:44 AM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, the patch is committed, you can check if the compiler
works for you.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>22.06.2020 1:29 PM, Ye Luo via
Openmp-dev пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">It is working well now. I tested both the miniapp
and the full application.<br clear="all">
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division & Leadership
Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Jun 22, 2020 at 11:56
AM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, could you try your code with the attached patch
for the clang? <br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>19.06.2020 6:45 PM, Ye Luo via Openmp-dev пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Yes. It makes the code working.</div>
<div>Do you have any idea about the code pattern?<br>
</div>
<div>Before the compiler fix gets in, I can put the
workaround in the real code which has multiple places
similar to the miniapp.</div>
<div>Best,<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division &
Leadership Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Jun 19, 2020
at 4:35 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Try to use `omp parallel firstprivate(first,
last, offload_scratch_iVP_ptr, psiinv_ptr, i)` as
the inner parallel directive.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>16.06.2020 9:26 PM, Ye Luo via Openmp-dev
пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi all,</div>
<div>I got libomptarget failure when offloading
the same kernel function from multiple host
threads. I reported the bug as <a href="https://bugs.llvm.org/show_bug.cgi?id=46257" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46257</a>.</div>
<div><br>
</div>
<div>After a bit investigation, I'm able to
narrow down a bit the scope of the issue.</div>
<div>The error from CUDA runtime is "an illegal
memory access was encountered".</div>
<div><br>
</div>
<div>
<div>A) On the libomptarget side<br>
</div>
</div>
<div>In libomptarget/plugins/cuda/src/rtl.cpp, I
added a few synchronization before and after
cuLaunchKernel and a mutex to protect the
kernel execution.<br>
</div>
<div><br>
</div>
<div> Err = cuCtxSynchronize();<br>
if (!checkResult(Err, "Error before
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;<br>
<br>
//kernelmtx.lock();<br>
Err = cuLaunchKernel(KernelInfo->Func,
CudaBlocksPerGrid, /* gridDimY */ 1,<br>
/* gridDimZ */ 1,
CudaThreadsPerBlock,<br>
/* blockDimY */ 1, /*
blockDimZ */ 1,<br>
/* sharedMemBytes */
0, Stream, &Args[0], nullptr);<br>
if (!checkResult(Err, "Error returned from
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;<br>
<br>
Err = cuCtxSynchronize();<br>
//kernelmtx.unlock();<br>
if (!checkResult(Err, "Error after
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;</div>
<div><br>
</div>
<div>Without the mutex. the first error from all
the threads is "Error after cuLaunchKernel".
After enabling the mutex, the code runs well.
If I move the mutex unlock() before the
cuCtxSynchronize, the code still runs into
error. So I think the error comes from kernel
execution not something else.</div>
<div><br>
</div>
<div>2) on my application side, I tried to see
what triggers the error. Once I commented out
"omp parallel" in the hierarchical
parallelism. <a href="https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258" target="_blank">https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258</a><br>
</div>
<div>Multi host threads offloading start to run
without error.</div>
<div><br>
</div>
<div>I'd like to ask the wider community to see
potential causes of this bug.<br>
</div>
<div>Is there a thread unsafe state machine with
a kernel generated from a target region? Or
any other potential reason for the error?</div>
<div><br>
</div>
<div>Best,</div>
<div>Ye<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division &
Leadership Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote></div>