<div dir="ltr"><div>I cannot use openmp-cuda-mode. I don't remember now exactly where (reduction?) but it breaks other things in my code.<br></div><div>Ye<br></div><div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>
Argonne National Laboratory</div></div></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 2:02 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>You can use shared memory directly by using #pragma omp allocate
and allocate clauses. Plus, shared memory is actively used by the
runtime. But I agree, that it would be good to allow to control
the size. <br>
</p>
<p>BTW, you can disable this globalization mechanism by using
-fopenmp-cuda-mode option. In this case, you need to control
sharing the variables yourself.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>25.06.2020 2:51 PM, Ye Luo пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I think hard-coded size 128 is suboptimal. Can we allow
users to select that?</div>
<div>Since we cannot use shared memory directly, I believe there
are plenty of shared memory available.</div>
<div>It seems that my test code is exceeding 128 byte and I'd
like to set a higher limit.<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division & Leadership
Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 1:02
PM Alexey Bataev <<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="auto">
Globalized variables are the variables that must be shared
between the threads but initially allocated by the master
thread. On GPU, local memory is private for threads, and the
local variables cannkt be shared between the threads. But
OpenMP requires that such variables must be shared between
the threads. So, if we see, that the local variable is
allocated in the context of the master thread and later is
shared in parallel region (or potentially can be shared, if
it may escape its declaration context by the reference or by
the address), such variables are allocated either in shared
or in global memory. If the total size of globalized
variables is less than 128 bytes, they are allocated in
shared memory. Otherwise, fallback to malloc. If
-fno-penmp-cuda-parallel-target-region is used, then again,
if total size <= 128 bytes - shared memory is used,
otherwise statically allocated global memory buffer is used.<br>
<br>
<div dir="ltr">Best regards,
<div>Alexey Bataev</div>
</div>
<div dir="ltr"><br>
<blockquote type="cite">25 июня 2020 г., в 13:45, Ye Luo
<<a href="mailto:xw111luoye@gmail.com" target="_blank">xw111luoye@gmail.com</a>>
написал(а):<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>Could you add documentation for this option?</div>
<div><span><tt>fopenmp-cuda-parallel-target-region</tt></span></div>
<div><span><tt>The name doesn't reflect the actual
effect.</tt></span></div>
<div><span><tt>The differential review mentions</tt></span></div>
<div><span><tt>```<br>
</tt></span></div>
<div><span><tt>Added support for dynamic memory
allocation for globalized variables in<br>
case if execution of target regions in parallel
is required.</tt></span></div>
<div><span><tt>```</tt></span></div>
<div><span><tt>So what are the globalized variables?
When do they occur? since it impacts
performance, application developers desire to
know more details.<br>
</tt></span></div>
<div><span><tt>```<br>
</tt></span></div>
<div><span><tt><span>
<p>It might use "slow" allocation functions,
in general, since it may use malloc on the
device side. Being disabled, it uses
statically preallocated memory, which might
be faster, if parallel target regions are
not required.</p>
<p>```</p>
In the slow code path, does it always call
malloc in the device or malloc gets called
when exceeding a size limit? Is the call
explicitly generated by clang or the cuda
driver handles it?<br>
</span></tt></span></div>
<div><span><tt><span>I saw some
<span>UseSharedMemory flags in the review. Is
the slow code path uses CUDA shared memory
and it may need global memory when the size
limit is exceeded?<br>
</span></span></tt></span></div>
<div><span><tt><span>In the fast code path, what is
the scope of this statically preallocated
memory? Is it owned by the CUDA plugin?</span></tt></span></div>
<div><span><tt><span><br>
</span></tt></span></div>
<div><span><tt><span>Thanks,<br>
</span></tt></span></div>
<div><span><tt><span>Ye<br>
</span></tt></span></div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division &
Leadership Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020
at 9:44 AM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, the patch is committed, you can check if
the compiler works for you.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>22.06.2020 1:29 PM, Ye Luo via Openmp-dev
пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">It is working well now. I tested
both the miniapp and the full application.<br clear="all">
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division
& Leadership Computing
Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Jun
22, 2020 at 11:56 AM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, could you try your code with the
attached patch for the clang? <br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>19.06.2020 6:45 PM, Ye Luo via
Openmp-dev пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Yes. It makes the code working.</div>
<div>Do you have any idea about the
code pattern?<br>
</div>
<div>Before the compiler fix gets in,
I can put the workaround in the real
code which has multiple places
similar to the miniapp.</div>
<div>Best,<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science
Division & Leadership
Computing Facility<br>
Argonne National
Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Fri, Jun 19, 2020 at 4:35 PM
Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Try to use `omp parallel
firstprivate(first, last,
offload_scratch_iVP_ptr,
psiinv_ptr, i)` as the inner
parallel directive.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>16.06.2020 9:26 PM, Ye Luo
via Openmp-dev пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi all,</div>
<div>I got libomptarget
failure when offloading the
same kernel function from
multiple host threads. I
reported the bug as
<a href="https://bugs.llvm.org/show_bug.cgi?id=46257" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46257</a>.</div>
<div><br>
</div>
<div>After a bit
investigation, I'm able to
narrow down a bit the scope
of the issue.</div>
<div>The error from CUDA
runtime is "an illegal
memory access was
encountered".</div>
<div><br>
</div>
<div>
<div>A) On the libomptarget
side<br>
</div>
</div>
<div>In
libomptarget/plugins/cuda/src/rtl.cpp,
I added a few
synchronization before and
after cuLaunchKernel and a
mutex to protect the kernel
execution.<br>
</div>
<div><br>
</div>
<div> Err =
cuCtxSynchronize();<br>
if (!checkResult(Err,
"Error before
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;<br>
<br>
//kernelmtx.lock();<br>
Err =
cuLaunchKernel(KernelInfo->Func,
CudaBlocksPerGrid, /*
gridDimY */ 1,<br>
/*
gridDimZ */ 1,
CudaThreadsPerBlock,<br>
/*
blockDimY */ 1, /* blockDimZ
*/ 1,<br>
/*
sharedMemBytes */ 0, Stream,
&Args[0], nullptr);<br>
if (!checkResult(Err,
"Error returned from
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;<br>
<br>
Err =
cuCtxSynchronize();<br>
//kernelmtx.unlock();<br>
if (!checkResult(Err,
"Error after
cuLaunchKernel\n"))<br>
return OFFLOAD_FAIL;</div>
<div><br>
</div>
<div>Without the mutex. the
first error from all the
threads is "Error after
cuLaunchKernel". After
enabling the mutex, the code
runs well. If I move the
mutex unlock() before the
cuCtxSynchronize, the code
still runs into error. So I
think the error comes from
kernel execution not
something else.</div>
<div><br>
</div>
<div>2) on my application
side, I tried to see what
triggers the error. Once I
commented out "omp parallel"
in the hierarchical
parallelism.
<a href="https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258" target="_blank">
https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258</a><br>
</div>
<div>Multi host threads
offloading start to run
without error.</div>
<div><br>
</div>
<div>I'd like to ask the wider
community to see potential
causes of this bug.<br>
</div>
<div>Is there a thread unsafe
state machine with a kernel
generated from a target
region? Or any other
potential reason for the
error?</div>
<div><br>
</div>
<div>Best,</div>
<div>Ye<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational
Science Division
& Leadership
Computing Facility<br>
Argonne National
Laboratory</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<span>_______________________________________________</span><br>
<span>Openmp-dev mailing list</span><br>
<span><a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a></span><br>
<span><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></span><br>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>