<div dir="ltr"><div>Yes. It makes the code working.</div><div>Do you have any idea about the code pattern?<br></div><div>Before the compiler fix gets in, I can put the workaround in the real code which has multiple places similar to the miniapp.</div><div>Best,<br></div><div>Ye<br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 19, 2020 at 4:35 PM Alexey.Bataev <<a href="mailto:a.bataev@outlook.com">a.bataev@outlook.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>Try to use `omp parallel firstprivate(first, last,

      offload_scratch_iVP_ptr, psiinv_ptr, i)` as the inner parallel

      directive.<br>

    </p>

    <pre cols="72">-------------

Best regards,

Alexey Bataev</pre>

    <div>16.06.2020 9:26 PM, Ye Luo via

      Openmp-dev пишет:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>Hi all,</div>

        <div>I got libomptarget failure when offloading the same kernel

          function from multiple host threads. I reported the bug as <a href="https://bugs.llvm.org/show_bug.cgi?id=46257" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46257</a>.</div>

        <div><br>

        </div>

        <div>After a bit investigation, I'm able to narrow down a bit

          the scope of the issue.</div>

        <div>The error from CUDA runtime is "an illegal memory access

          was encountered".</div>

        <div><br>

        </div>

        <div>

          <div>A) On the libomptarget side<br>

          </div>

        </div>

        <div>In libomptarget/plugins/cuda/src/rtl.cpp, I added a few

          synchronization before and after cuLaunchKernel and a mutex to

          protect the kernel execution.<br>

        </div>

        <div><br>

        </div>

        <div>    Err = cuCtxSynchronize();<br>

              if (!checkResult(Err, "Error before cuLaunchKernel\n"))<br>

                return OFFLOAD_FAIL;<br>

          <br>

              //kernelmtx.lock();<br>

              Err = cuLaunchKernel(KernelInfo->Func,

          CudaBlocksPerGrid, /* gridDimY */ 1,<br>

                                   /* gridDimZ */ 1,

          CudaThreadsPerBlock,<br>

                                   /* blockDimY */ 1, /* blockDimZ */ 1,<br>

                                   /* sharedMemBytes */ 0, Stream,

          &Args[0], nullptr);<br>

              if (!checkResult(Err, "Error returned from

          cuLaunchKernel\n"))<br>

                return OFFLOAD_FAIL;<br>

          <br>

              Err = cuCtxSynchronize();<br>

              //kernelmtx.unlock();<br>

              if (!checkResult(Err, "Error after cuLaunchKernel\n"))<br>

                return OFFLOAD_FAIL;</div>

        <div><br>

        </div>

        <div>Without the mutex. the first error from all the threads is

          "Error after cuLaunchKernel". After enabling the mutex, the

          code runs well. If I move the mutex unlock() before the

          cuCtxSynchronize, the code still runs into error. So I think

          the error comes from kernel execution not something else.</div>

        <div><br>

        </div>

        <div>2) on my application side, I tried to see what triggers the

          error. Once I commented out "omp parallel" in the hierarchical

          parallelism. <a href="https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258" target="_blank">https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258</a><br>

        </div>

        <div>Multi host threads offloading start to run without error.</div>

        <div><br>

        </div>

        <div>I'd like to ask the wider community to see potential causes

          of this bug.<br>

        </div>

        <div>Is there a thread unsafe state machine with a kernel

          generated from a target region? Or any other potential reason

          for the error?</div>

        <div><br>

        </div>

        <div>Best,</div>

        <div>Ye<br>

        </div>

        <div>

          <div dir="ltr">

            <div dir="ltr">

              <div>

                <div dir="ltr">===================<br>

                  Ye Luo, Ph.D.<br>

                  Computational Science Division & Leadership

                  Computing Facility<br>

                  Argonne National Laboratory</div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>_______________________________________________

Openmp-dev mailing list

<a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>

</pre>

    </blockquote>

  </div>

</blockquote></div>