[Openmp-dev] Potential offload kernel data race

Ye Luo via Openmp-dev openmp-dev at lists.llvm.org
Mon Jun 22 10:29:37 PDT 2020


It is working well now. I tested both the miniapp and the full application.
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Mon, Jun 22, 2020 at 11:56 AM Alexey.Bataev <a.bataev at outlook.com> wrote:

> Hi Ye, could you try your code with the attached patch for the clang?
>
> -------------
> Best regards,
> Alexey Bataev
>
> 19.06.2020 6:45 PM, Ye Luo via Openmp-dev пишет:
>
> Yes. It makes the code working.
> Do you have any idea about the code pattern?
> Before the compiler fix gets in, I can put the workaround in the real code
> which has multiple places similar to the miniapp.
> Best,
> Ye
> ===================
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Fri, Jun 19, 2020 at 4:35 PM Alexey.Bataev <a.bataev at outlook.com>
> wrote:
>
>> Try to use `omp parallel firstprivate(first, last,
>> offload_scratch_iVP_ptr, psiinv_ptr, i)` as the inner parallel directive.
>>
>> -------------
>> Best regards,
>> Alexey Bataev
>>
>> 16.06.2020 9:26 PM, Ye Luo via Openmp-dev пишет:
>>
>> Hi all,
>> I got libomptarget failure when offloading the same kernel function from
>> multiple host threads. I reported the bug as
>> https://bugs.llvm.org/show_bug.cgi?id=46257.
>>
>> After a bit investigation, I'm able to narrow down a bit the scope of the
>> issue.
>> The error from CUDA runtime is "an illegal memory access was encountered".
>>
>> A) On the libomptarget side
>> In libomptarget/plugins/cuda/src/rtl.cpp, I added a few synchronization
>> before and after cuLaunchKernel and a mutex to protect the kernel execution.
>>
>>     Err = cuCtxSynchronize();
>>     if (!checkResult(Err, "Error before cuLaunchKernel\n"))
>>       return OFFLOAD_FAIL;
>>
>>     //kernelmtx.lock();
>>     Err = cuLaunchKernel(KernelInfo->Func, CudaBlocksPerGrid, /* gridDimY
>> */ 1,
>>                          /* gridDimZ */ 1, CudaThreadsPerBlock,
>>                          /* blockDimY */ 1, /* blockDimZ */ 1,
>>                          /* sharedMemBytes */ 0, Stream, &Args[0],
>> nullptr);
>>     if (!checkResult(Err, "Error returned from cuLaunchKernel\n"))
>>       return OFFLOAD_FAIL;
>>
>>     Err = cuCtxSynchronize();
>>     //kernelmtx.unlock();
>>     if (!checkResult(Err, "Error after cuLaunchKernel\n"))
>>       return OFFLOAD_FAIL;
>>
>> Without the mutex. the first error from all the threads is "Error after
>> cuLaunchKernel". After enabling the mutex, the code runs well. If I move
>> the mutex unlock() before the cuCtxSynchronize, the code still runs into
>> error. So I think the error comes from kernel execution not something else.
>>
>> 2) on my application side, I tried to see what triggers the error. Once I
>> commented out "omp parallel" in the hierarchical parallelism.
>> https://github.com/QMCPACK/miniqmc/blob/5a11c4131dbf91bf9f9977057a03aee485368f0d/src/QMCWaveFunctions/einspline_spo_omp.cpp#L258
>> Multi host threads offloading start to run without error.
>>
>> I'd like to ask the wider community to see potential causes of this bug.
>> Is there a thread unsafe state machine with a kernel generated from a
>> target region? Or any other potential reason for the error?
>>
>> Best,
>> Ye
>> ===================
>> Ye Luo, Ph.D.
>> Computational Science Division & Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> Openmp-dev mailing listOpenmp-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>>
> _______________________________________________
> Openmp-dev mailing listOpenmp-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200622/5711c064/attachment.html>


More information about the Openmp-dev mailing list