<div dir="ltr"><div>Thank you. I will try that.</div><div>Ye<br></div><div><div><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 2:16 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>int a;<br>

    </p>

    <p>#pragma omp allocate(a) allocator(omp_pteam_mem_alloc)</p>

    <p>or for privatized variables #pragma omp ... private(a)

      allocate(omp_pteam_mem_alloc: a)</p>

    <p>But you should be very careful here too, it might lead to data

      race. The variables are still statically allocated and, probably,

      to avoid data race, you will need to create an array and allocate

      memory for the array of the variables.<br>

    </p>

    <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

    <div>25.06.2020 3:10 PM, Ye Luo пишет:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Do you have an example using shared memory via #pragma omp

          allocate?</div>

        <div>Ye<br>

        </div>

        <div>

          <div dir="ltr">

            <div dir="ltr">

              <div>

                <div dir="ltr">===================<br>

                  Ye Luo, Ph.D.<br>

                  Computational Science Division & Leadership

                  Computing Facility<br>

                  Argonne National Laboratory</div>

              </div>

            </div>

          </div>

        </div>

        <br>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020 at 2:09

          PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p>Yes, reduction variables also can be candidates for

              globalization if the reduction variable must be shared in

              parallel region. You can try to control it manually using

              allocate clause.<br>

            </p>

            <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

            <div>25.06.2020 3:05 PM, Ye Luo пишет:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">

                <div>I cannot use openmp-cuda-mode. I don't remember now

                  exactly where (reduction?) but it breaks other things

                  in my code.<br>

                </div>

                <div>Ye<br>

                </div>

                <div>

                  <div>

                    <div>

                      <div dir="ltr">

                        <div dir="ltr">

                          <div>

                            <div dir="ltr">===================<br>

                              Ye Luo, Ph.D.<br>

                              Computational Science Division &

                              Leadership Computing Facility<br>

                              Argonne National Laboratory</div>

                          </div>

                        </div>

                      </div>

                    </div>

                    <br>

                  </div>

                </div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Thu, Jun 25, 2020

                  at 2:02 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                  <div>

                    <p>You can use shared memory directly by using

                      #pragma omp allocate and allocate clauses. Plus,

                      shared memory is actively used by the runtime. But

                      I agree, that it would be good to allow to control

                      the size. <br>

                    </p>

                    <p>BTW, you can disable this globalization mechanism

                      by using -fopenmp-cuda-mode option. In this case,

                      you need to control sharing the variables

                      yourself.<br>

                    </p>

                    <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

                    <div>25.06.2020 2:51 PM, Ye Luo пишет:<br>

                    </div>

                    <blockquote type="cite">

                      <div dir="ltr">

                        <div>I think hard-coded size 128 is suboptimal.

                          Can we allow users to select that?</div>

                        <div>Since we cannot use shared memory directly,

                          I believe there are plenty of shared memory

                          available.</div>

                        <div>It seems that my test code is exceeding 128

                          byte and I'd like to set a higher limit.<br>

                        </div>

                        <div>Ye<br>

                        </div>

                        <div>

                          <div>

                            <div dir="ltr">

                              <div dir="ltr">

                                <div>

                                  <div dir="ltr">===================<br>

                                    Ye Luo, Ph.D.<br>

                                    Computational Science Division &

                                    Leadership Computing Facility<br>

                                    Argonne National Laboratory</div>

                                </div>

                              </div>

                            </div>

                          </div>

                          <br>

                        </div>

                      </div>

                      <br>

                      <div class="gmail_quote">

                        <div dir="ltr" class="gmail_attr">On Thu, Jun

                          25, 2020 at 1:02 PM Alexey Bataev <<a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>>

                          wrote:<br>

                        </div>

                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                          <div dir="auto"> Globalized variables are the

                            variables that must be shared between the

                            threads but initially allocated by the

                            master thread. On GPU, local memory is

                            private for threads, and the local variables

                            cannkt be shared between the threads. But

                            OpenMP requires that such variables must be

                            shared between the threads. So, if we see,

                            that the local variable is allocated in the

                            context of the master thread and later is

                            shared in parallel region (or potentially

                            can be shared, if it may escape its

                            declaration context by the reference or by

                            the address), such variables are allocated

                            either in shared or in global memory. If the

                            total size of globalized variables is less

                            than 128 bytes, they are allocated in shared

                            memory. Otherwise, fallback to malloc. If

                            -fno-penmp-cuda-parallel-target-region is

                            used, then again, if total size <= 128

                            bytes - shared memory is used, otherwise

                            statically allocated global memory buffer is

                            used.<br>

                            <br>

                            <div dir="ltr">Best regards,

                              <div>Alexey Bataev</div>

                            </div>

                            <div dir="ltr"><br>

                              <blockquote type="cite">25 июня 2020 г., в

                                13:45, Ye Luo <<a href="mailto:xw111luoye@gmail.com" target="_blank">xw111luoye@gmail.com</a>>

                                написал(а):<br>

                                <br>

                              </blockquote>

                            </div>

                            <blockquote type="cite">

                              <div dir="ltr">

                                <div dir="ltr">

                                  <div>Could you add documentation for

                                    this option?</div>

                                  <div><span><tt>fopenmp-cuda-parallel-target-region</tt></span></div>

                                  <div><span><tt>The name doesn't

                                        reflect the actual effect.</tt></span></div>

                                  <div><span><tt>The differential review

                                        mentions</tt></span></div>

                                  <div><span><tt>```<br>

                                      </tt></span></div>

                                  <div><span><tt>Added support for

                                        dynamic memory allocation for

                                        globalized variables in<br>

                                        case if execution of target

                                        regions in parallel is required.</tt></span></div>

                                  <div><span><tt>```</tt></span></div>

                                  <div><span><tt>So what are the

                                        globalized variables? When do

                                        they occur? since it impacts

                                        performance, application

                                        developers desire to know more

                                        details.<br>

                                      </tt></span></div>

                                  <div><span><tt>```<br>

                                      </tt></span></div>

                                  <div><span><tt><span>

                                          <p>It might use "slow"

                                            allocation functions, in

                                            general, since it may use

                                            malloc on the device side.

                                            Being disabled, it uses

                                            statically preallocated

                                            memory, which might be

                                            faster, if parallel target

                                            regions are not required.</p>

                                          <p>```</p>

                                          In the slow code path, does it

                                          always call malloc in the

                                          device or malloc gets called

                                          when exceeding a size limit?

                                          Is the call explicitly

                                          generated by clang or the cuda

                                          driver handles it?<br>

                                        </span></tt></span></div>

                                  <div><span><tt><span>I saw some <span>UseSharedMemory

                                            flags in the review. Is the

                                            slow code path uses CUDA

                                            shared memory and it may

                                            need global memory when the

                                            size limit is exceeded?<br>

                                          </span></span></tt></span></div>

                                  <div><span><tt><span>In the fast code

                                          path, what is the scope of

                                          this statically preallocated

                                          memory? Is it owned by the

                                          CUDA plugin?</span></tt></span></div>

                                  <div><span><tt><span><br>

                                        </span></tt></span></div>

                                  <div><span><tt><span>Thanks,<br>

                                        </span></tt></span></div>

                                  <div><span><tt><span>Ye<br>

                                        </span></tt></span></div>

                                  <div>

                                    <div>

                                      <div dir="ltr">

                                        <div dir="ltr">

                                          <div>

                                            <div dir="ltr">===================<br>

                                              Ye Luo, Ph.D.<br>

                                              Computational Science

                                              Division & Leadership

                                              Computing Facility<br>

                                              Argonne National

                                              Laboratory</div>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                    <br>

                                  </div>

                                </div>

                                <br>

                                <div class="gmail_quote">

                                  <div dir="ltr" class="gmail_attr">On

                                    Thu, Jun 25, 2020 at 9:44 AM

                                    Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>

                                    wrote:<br>

                                  </div>

                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                    <div>

                                      <p>Hi Ye, the patch is committed,

                                        you can check if the compiler

                                        works for you.<br>

                                      </p>

                                      <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

                                      <div>22.06.2020 1:29 PM, Ye Luo

                                        via Openmp-dev пишет:<br>

                                      </div>

                                      <blockquote type="cite">

                                        <div dir="ltr">It is working

                                          well now. I tested both the

                                          miniapp and the full

                                          application.<br clear="all">

                                          <div>

                                            <div>

                                              <div dir="ltr">

                                                <div dir="ltr">

                                                  <div>

                                                    <div dir="ltr">===================<br>

                                                      Ye Luo, Ph.D.<br>

                                                      Computational

                                                      Science Division

                                                      & Leadership

                                                      Computing Facility<br>

                                                      Argonne National

                                                      Laboratory</div>

                                                  </div>

                                                </div>

                                              </div>

                                            </div>

                                            <br>

                                          </div>

                                        </div>

                                        <br>

                                        <div class="gmail_quote">

                                          <div dir="ltr" class="gmail_attr">On Mon,

                                            Jun 22, 2020 at 11:56 AM

                                            Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>

                                            wrote:<br>

                                          </div>

                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                            <div>

                                              <p>Hi Ye, could you try

                                                your code with the

                                                attached patch for the

                                                clang? <br>

                                              </p>

                                              <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

                                              <div>19.06.2020 6:45 PM,

                                                Ye Luo via Openmp-dev

                                                пишет:<br>

                                              </div>

                                              <blockquote type="cite">

                                                <div dir="ltr">

                                                  <div>Yes. It makes the

                                                    code working.</div>

                                                  <div>Do you have any

                                                    idea about the code

                                                    pattern?<br>

                                                  </div>

                                                  <div>Before the

                                                    compiler fix gets

                                                    in, I can put the

                                                    workaround in the

                                                    real code which has

                                                    multiple places

                                                    similar to the

                                                    miniapp.</div>

                                                  <div>Best,<br>

                                                  </div>

                                                  <div>Ye<br>

                                                  </div>

                                                  <div>

                                                    <div>

                                                      <div dir="ltr">

                                                        <div dir="ltr">

                                                          <div>

                                                          <div dir="ltr">===================<br>

                                                          Ye Luo, Ph.D.<br>

                                                          Computational

                                                          Science

                                                          Division &

                                                          Leadership

                                                          Computing

                                                          Facility<br>

                                                          Argonne

                                                          National

                                                          Laboratory</div>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </div>

                                                    <br>

                                                  </div>

                                                </div>

                                                <br>

                                                <div class="gmail_quote">

                                                  <div dir="ltr" class="gmail_attr">On

                                                    Fri, Jun 19, 2020 at

                                                    4:35 PM

                                                    Alexey.Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>>

                                                    wrote:<br>

                                                  </div>

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                    <div>

                                                      <p>Try to use `omp

                                                        parallel

                                                        firstprivate(first,

                                                        last,

                                                        offload_scratch_iVP_ptr,

                                                        psiinv_ptr, i)`

                                                        as the inner

                                                        parallel

                                                        directive.<br>

                                                      </p>

                                                      <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

                                                      <div>16.06.2020

                                                        9:26 PM, Ye Luo

                                                        via Openmp-dev

                                                        пишет:<br>

                                                      </div>

                                                      <blockquote type="cite">

                                                        <div dir="ltr">

                                                          <div>Hi all,</div>

                                                          <div>I got

                                                          libomptarget

                                                          failure when

                                                          offloading the

                                                          same kernel

                                                          function from

                                                          multiple host

                                                          threads. I

                                                          reported the

                                                          bug as <a href="https://bugs.llvm.org/show_bug.cgi?id=46257" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46257</a>.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>After a

                                                          bit

                                                          investigation,

                                                          I'm able to

                                                          narrow down a

                                                          bit the scope

                                                          of the issue.</div>

                                                          <div>The error

                                                          from CUDA

                                                          runtime is "an

                                                          illegal memory

                                                          access was

                                                          encountered".</div>

                                                          <div><br>

                                                          </div>

                                                          <div>

                                                          <div>A) On the

                                                          libomptarget

                                                          side<br>

                                                          </div>

                                                          </div>

                                                          <div>In

                                                          libomptarget/plugins/cuda/src/rtl.cpp,

                                                          I added a few

synchronization before and after cuLaunchKernel and a mutex to protect

                                                          the kernel

                                                          execution.<br>

                                                          </div>

                                                          <div><br>

                                                          </div>

                                                          <div>    Err =

cuCtxSynchronize();<br>

                                                              if

                                                          (!checkResult(Err,

                                                          "Error before

cuLaunchKernel\n"))<br>

                                                                return

                                                          OFFLOAD_FAIL;<br>

                                                          <br>

                                                          //kernelmtx.lock();<br>

                                                              Err =

                                                          cuLaunchKernel(KernelInfo->Func,

CudaBlocksPerGrid, /* gridDimY */ 1,<br>

                                                                     /*

                                                          gridDimZ */ 1,

CudaThreadsPerBlock,<br>

                                                                     /*

                                                          blockDimY */

                                                          1, /*

                                                          blockDimZ */

                                                          1,<br>

                                                                     /*

                                                          sharedMemBytes

                                                          */ 0, Stream,

                                                          &Args[0],

                                                          nullptr);<br>

                                                              if

                                                          (!checkResult(Err,

                                                          "Error

                                                          returned from

cuLaunchKernel\n"))<br>

                                                                return

                                                          OFFLOAD_FAIL;<br>

                                                          <br>

                                                              Err =

                                                          cuCtxSynchronize();<br>

                                                          //kernelmtx.unlock();<br>

                                                              if

                                                          (!checkResult(Err,

                                                          "Error after

                                                          cuLaunchKernel\n"))<br>

                                                                return

                                                          OFFLOAD_FAIL;</div>

                                                          <div><br>

                                                          </div>

                                                          <div>Without

                                                          the mutex. the

                                                          first error

                                                          from all the

                                                          threads is

                                                          "Error after

                                                          cuLaunchKernel".

                                                          After enabling

                                                          the mutex, the

                                                          code runs

                                                          well. If I

                                                          move the mutex

                                                          unlock()

                                                          before the

                                                          cuCtxSynchronize,

                                                          the code still

                                                          runs into

                                                          error. So I

                                                          think the

                                                          error comes

                                                          from kernel

                                                          execution not

                                                          something

                                                          else.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>2) on my

                                                          application

                                                          side, I tried

                                                          to see what

                                                          triggers the

                                                          error. Once I

                                                          commented out

                                                          "omp parallel"

                                                          in the

                                                          hierarchical

                                                          parallelism. <a href="https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258" target="_blank">

https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258</a><br>

                                                          </div>

                                                          <div>Multi

                                                          host threads

                                                          offloading

                                                          start to run

                                                          without error.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>I'd like

                                                          to ask the

                                                          wider

                                                          community to

                                                          see potential

                                                          causes of this

                                                          bug.<br>

                                                          </div>

                                                          <div>Is there

                                                          a thread

                                                          unsafe state

                                                          machine with a

                                                          kernel

                                                          generated from

                                                          a target

                                                          region? Or any

                                                          other

                                                          potential

                                                          reason for the

                                                          error?</div>

                                                          <div><br>

                                                          </div>

                                                          <div>Best,</div>

                                                          <div>Ye<br>

                                                          </div>

                                                          <div>

                                                          <div dir="ltr">

                                                          <div dir="ltr">

                                                          <div>

                                                          <div dir="ltr">===================<br>

                                                          Ye Luo, Ph.D.<br>

                                                          Computational

                                                          Science

                                                          Division &

                                                          Leadership

                                                          Computing

                                                          Facility<br>

                                                          Argonne

                                                          National

                                                          Laboratory</div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                        </div>

                                                        <br>

                                                        <fieldset></fieldset>

                                                        <pre>_______________________________________________

Openmp-dev mailing list

<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>

</pre>

                                                      </blockquote>

                                                    </div>

                                                  </blockquote>

                                                </div>

                                                <br>

                                                <fieldset></fieldset>

                                                <pre>_______________________________________________

Openmp-dev mailing list

<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>

</pre>

                                              </blockquote>

                                            </div>

                                          </blockquote>

                                        </div>

                                        <br>

                                        <fieldset></fieldset>

                                        <pre>_______________________________________________

Openmp-dev mailing list

<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>

</pre>

                                      </blockquote>

                                    </div>

                                  </blockquote>

                                </div>

                                <span>_______________________________________________</span><br>

                                <span>Openmp-dev mailing list</span><br>

                                <span><a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a></span><br>

                                <span><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></span><br>

                              </div>

                            </blockquote>

                          </div>

                        </blockquote>

                      </div>

                    </blockquote>

                  </div>

                </blockquote>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </div>

</blockquote></div>