[Openmp-dev] Potential offload kernel data race

Tue Jun 16 19:00:16 PDT 2020

It was from commit 1c3d7709dec22c61d9c3105e4838edce8e6ac014 (HEAD ->
master, origin/master, origin/HEAD)
Date:   Sun Jun 14 16:24:18 2020 +0100

I added those synchronizations for debugging purposes.
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory

On Tue, Jun 16, 2020 at 8:55 PM Shilei Tian <shiltian at cs.stonybrook.edu>
wrote:

> Hi Ye,
>
> Thanks for your report. From the code snippet, you seem to use a very old
> version of the RTL. Would you please try the trunk as we don’t have those
> kind of synchronization.
>
> Regards,
> Shilei Tian
> Ph.D. Student
> Exasca||ab
> Institute of Advanced Computational Science
> Stony Brook University
>
> On Jun 16, 2020, at 21:26, Ye Luo via Openmp-dev <
> openmp-dev at lists.llvm.org> wrote:
>
> Hi all,
> I got libomptarget failure when offloading the same kernel function from
> multiple host threads. I reported the bug as
> https://bugs.llvm.org/show_bug.cgi?id=46257.
>
> After a bit investigation, I'm able to narrow down a bit the scope of the
> issue.
> The error from CUDA runtime is "an illegal memory access was encountered".
>
> A) On the libomptarget side
> In libomptarget/plugins/cuda/src/rtl.cpp, I added a few synchronization
> before and after cuLaunchKernel and a mutex to protect the kernel execution.
>
>     Err = cuCtxSynchronize();
>     if (!checkResult(Err, "Error before cuLaunchKernel\n"))
>       return OFFLOAD_FAIL;
>
>     //kernelmtx.lock();
>     Err = cuLaunchKernel(KernelInfo->Func, CudaBlocksPerGrid, /* gridDimY
> */ 1,
>                          /* gridDimZ */ 1, CudaThreadsPerBlock,
>                          /* blockDimY */ 1, /* blockDimZ */ 1,
>                          /* sharedMemBytes */ 0, Stream, &Args[0],
> nullptr);
>     if (!checkResult(Err, "Error returned from cuLaunchKernel\n"))
>       return OFFLOAD_FAIL;
>
>     Err = cuCtxSynchronize();
>     //kernelmtx.unlock();
>     if (!checkResult(Err, "Error after cuLaunchKernel\n"))
>       return OFFLOAD_FAIL;
>
> Without the mutex. the first error from all the threads is "Error after
> cuLaunchKernel". After enabling the mutex, the code runs well. If I move
> the mutex unlock() before the cuCtxSynchronize, the code still runs into
> error. So I think the error comes from kernel execution not something else.
>
> 2) on my application side, I tried to see what triggers the error. Once I
> commented out "omp parallel" in the hierarchical parallelism.
> https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258
> Multi host threads offloading start to run without error.
>
> I'd like to ask the wider community to see potential causes of this bug.
> Is there a thread unsafe state machine with a kernel generated from a
> target region? Or any other potential reason for the error?
>
> Best,
> Ye
> ===================
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200616/678d4691/attachment.html>