<div dir="ltr"><div>Thank you. I will try that.</div><div>Ye<br></div><div><div><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>
Argonne National Laboratory</div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 2:16 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>int a;<br>
</p>
<p>#pragma omp allocate(a) allocator(omp_pteam_mem_alloc)</p>
<p>or for privatized variables #pragma omp ... private(a)
allocate(omp_pteam_mem_alloc: a)</p>
<p>But you should be very careful here too, it might lead to data
race. The variables are still statically allocated and, probably,
to avoid data race, you will need to create an array and allocate
memory for the array of the variables.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>25.06.2020 3:10 PM, Ye Luo пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Do you have an example using shared memory via #pragma omp
allocate?</div>
<div>Ye<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division & Leadership
Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 2:09
PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Yes, reduction variables also can be candidates for
globalization if the reduction variable must be shared in
parallel region. You can try to control it manually using
allocate clause.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>25.06.2020 3:05 PM, Ye Luo пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I cannot use openmp-cuda-mode. I don't remember now
exactly where (reduction?) but it breaks other things
in my code.<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division &
Leadership Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020
at 2:02 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>You can use shared memory directly by using
#pragma omp allocate and allocate clauses. Plus,
shared memory is actively used by the runtime. But
I agree, that it would be good to allow to control
the size. <br>
</p>
<p>BTW, you can disable this globalization mechanism
by using -fopenmp-cuda-mode option. In this case,
you need to control sharing the variables
yourself.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>25.06.2020 2:51 PM, Ye Luo пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I think hard-coded size 128 is suboptimal.
Can we allow users to select that?</div>
<div>Since we cannot use shared memory directly,
I believe there are plenty of shared memory
available.</div>
<div>It seems that my test code is exceeding 128
byte and I'd like to set a higher limit.<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science Division &
Leadership Computing Facility<br>
Argonne National Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jun
25, 2020 at 1:02 PM Alexey Bataev <<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="auto"> Globalized variables are the
variables that must be shared between the
threads but initially allocated by the
master thread. On GPU, local memory is
private for threads, and the local variables
cannkt be shared between the threads. But
OpenMP requires that such variables must be
shared between the threads. So, if we see,
that the local variable is allocated in the
context of the master thread and later is
shared in parallel region (or potentially
can be shared, if it may escape its
declaration context by the reference or by
the address), such variables are allocated
either in shared or in global memory. If the
total size of globalized variables is less
than 128 bytes, they are allocated in shared
memory. Otherwise, fallback to malloc. If
-fno-penmp-cuda-parallel-target-region is
used, then again, if total size <= 128
bytes - shared memory is used, otherwise
statically allocated global memory buffer is
used.<br>
<br>
<div dir="ltr">Best regards,
<div>Alexey Bataev</div>
</div>
<div dir="ltr"><br>
<blockquote type="cite">25 июня 2020 г., в
13:45, Ye Luo <<a href="mailto:xw111luoye@gmail.com" target="_blank">xw111luoye@gmail.com</a>>
написал(а):<br>
<br>
</blockquote>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>Could you add documentation for
this option?</div>
<div><span><tt>fopenmp-cuda-parallel-target-region</tt></span></div>
<div><span><tt>The name doesn't
reflect the actual effect.</tt></span></div>
<div><span><tt>The differential review
mentions</tt></span></div>
<div><span><tt>```<br>
</tt></span></div>
<div><span><tt>Added support for
dynamic memory allocation for
globalized variables in<br>
case if execution of target
regions in parallel is required.</tt></span></div>
<div><span><tt>```</tt></span></div>
<div><span><tt>So what are the
globalized variables? When do
they occur? since it impacts
performance, application
developers desire to know more
details.<br>
</tt></span></div>
<div><span><tt>```<br>
</tt></span></div>
<div><span><tt><span>
<p>It might use "slow"
allocation functions, in
general, since it may use
malloc on the device side.
Being disabled, it uses
statically preallocated
memory, which might be
faster, if parallel target
regions are not required.</p>
<p>```</p>
In the slow code path, does it
always call malloc in the
device or malloc gets called
when exceeding a size limit?
Is the call explicitly
generated by clang or the cuda
driver handles it?<br>
</span></tt></span></div>
<div><span><tt><span>I saw some <span>UseSharedMemory
flags in the review. Is the
slow code path uses CUDA
shared memory and it may
need global memory when the
size limit is exceeded?<br>
</span></span></tt></span></div>
<div><span><tt><span>In the fast code
path, what is the scope of
this statically preallocated
memory? Is it owned by the
CUDA plugin?</span></tt></span></div>
<div><span><tt><span><br>
</span></tt></span></div>
<div><span><tt><span>Thanks,<br>
</span></tt></span></div>
<div><span><tt><span>Ye<br>
</span></tt></span></div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational Science
Division & Leadership
Computing Facility<br>
Argonne National
Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Thu, Jun 25, 2020 at 9:44 AM
Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, the patch is committed,
you can check if the compiler
works for you.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>22.06.2020 1:29 PM, Ye Luo
via Openmp-dev пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">It is working
well now. I tested both the
miniapp and the full
application.<br clear="all">
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational
Science Division
& Leadership
Computing Facility<br>
Argonne National
Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon,
Jun 22, 2020 at 11:56 AM
Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ye, could you try
your code with the
attached patch for the
clang? <br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>19.06.2020 6:45 PM,
Ye Luo via Openmp-dev
пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Yes. It makes the
code working.</div>
<div>Do you have any
idea about the code
pattern?<br>
</div>
<div>Before the
compiler fix gets
in, I can put the
workaround in the
real code which has
multiple places
similar to the
miniapp.</div>
<div>Best,<br>
</div>
<div>Ye<br>
</div>
<div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational
Science
Division &
Leadership
Computing
Facility<br>
Argonne
National
Laboratory</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Fri, Jun 19, 2020 at
4:35 PM
Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Try to use `omp
parallel
firstprivate(first,
last,
offload_scratch_iVP_ptr,
psiinv_ptr, i)`
as the inner
parallel
directive.<br>
</p>
<pre cols="72">-------------
Best regards,
Alexey Bataev</pre>
<div>16.06.2020
9:26 PM, Ye Luo
via Openmp-dev
пишет:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi all,</div>
<div>I got
libomptarget
failure when
offloading the
same kernel
function from
multiple host
threads. I
reported the
bug as <a href="https://bugs.llvm.org/show_bug.cgi?id=46257" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46257</a>.</div>
<div><br>
</div>
<div>After a
bit
investigation,
I'm able to
narrow down a
bit the scope
of the issue.</div>
<div>The error
from CUDA
runtime is "an
illegal memory
access was
encountered".</div>
<div><br>
</div>
<div>
<div>A) On the
libomptarget
side<br>
</div>
</div>
<div>In
libomptarget/plugins/cuda/src/rtl.cpp,
I added a few
synchronization before and after cuLaunchKernel and a mutex to protect
the kernel
execution.<br>
</div>
<div><br>
</div>
<div> Err =
cuCtxSynchronize();<br>
if
(!checkResult(Err,
"Error before
cuLaunchKernel\n"))<br>
return
OFFLOAD_FAIL;<br>
<br>
//kernelmtx.lock();<br>
Err =
cuLaunchKernel(KernelInfo->Func,
CudaBlocksPerGrid, /* gridDimY */ 1,<br>
/*
gridDimZ */ 1,
CudaThreadsPerBlock,<br>
/*
blockDimY */
1, /*
blockDimZ */
1,<br>
/*
sharedMemBytes
*/ 0, Stream,
&Args[0],
nullptr);<br>
if
(!checkResult(Err,
"Error
returned from
cuLaunchKernel\n"))<br>
return
OFFLOAD_FAIL;<br>
<br>
Err =
cuCtxSynchronize();<br>
//kernelmtx.unlock();<br>
if
(!checkResult(Err,
"Error after
cuLaunchKernel\n"))<br>
return
OFFLOAD_FAIL;</div>
<div><br>
</div>
<div>Without
the mutex. the
first error
from all the
threads is
"Error after
cuLaunchKernel".
After enabling
the mutex, the
code runs
well. If I
move the mutex
unlock()
before the
cuCtxSynchronize,
the code still
runs into
error. So I
think the
error comes
from kernel
execution not
something
else.</div>
<div><br>
</div>
<div>2) on my
application
side, I tried
to see what
triggers the
error. Once I
commented out
"omp parallel"
in the
hierarchical
parallelism. <a href="https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258" target="_blank">
https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258</a><br>
</div>
<div>Multi
host threads
offloading
start to run
without error.</div>
<div><br>
</div>
<div>I'd like
to ask the
wider
community to
see potential
causes of this
bug.<br>
</div>
<div>Is there
a thread
unsafe state
machine with a
kernel
generated from
a target
region? Or any
other
potential
reason for the
error?</div>
<div><br>
</div>
<div>Best,</div>
<div>Ye<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>
Computational
Science
Division &
Leadership
Computing
Facility<br>
Argonne
National
Laboratory</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Openmp-dev mailing list
<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</div>
</blockquote>
</div>
<span>_______________________________________________</span><br>
<span>Openmp-dev mailing list</span><br>
<span><a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a></span><br>
<span><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></span><br>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div>