[llvm-bugs] [Bug 45139] New: intermittent error of "#pragma omp requires unified_shared_memory' not used consistently!" when mixing Python and OpenMP target offload

Fri Mar 6 11:16:56 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=45139

            Bug ID: 45139
           Summary: intermittent error of "#pragma omp requires
                    unified_shared_memory' not used consistently!" when
                    mixing Python and OpenMP target offload
           Product: OpenMP
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Runtime Library
          Assignee: unassignedbugs at nondot.org
          Reporter: csdaley at lbl.gov
                CC: llvm-bugs at lists.llvm.org

We encounter a race condition error when running a Python application which
calls a C++ library containing OpenMP target offload. The error message is
"Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not
used consistently!". This OpenMP directive does not appear anywhere in the C++
library! We do not have a simple reproducer. I am using LLVM/Clang from 25 Feb
2020 (which contains the fix for Bug ID #44933) The output below shows 5 runs
of the full production application: 2/5 runs fail with this error.

> for i in `seq 0 4`; do echo -e "\nRun $i"; srun --pty python tests/parallel_trace_timing.py --gpu --nside 128 --perturbBC 1e-7; done

Run 0
Using 80 threads
Using minChunk of 2_000
Using nside of 128
Tracing 13_602 rays.

Trace in place
722_117 rays per second
time for parallel_trace_timing = 0.21

Run 1
Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not
used consistently!
srun: error: cgpu07: task 0: Exited with exit code 1
srun: Terminating job step 484034.80

Run 2
Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not
used consistently!
srun: error: cgpu07: task 0: Exited with exit code 1
srun: Terminating job step 484034.81

Run 3
Using 80 threads
Using minChunk of 2_000
Using nside of 128
Tracing 13_602 rays.

Trace in place
840_888 rays per second
time for parallel_trace_timing = 0.20

Run 4
Using 80 threads
Using minChunk of 2_000
Using nside of 128
Tracing 13_602 rays.

Trace in place
837_063 rays per second
time for parallel_trace_timing = 0.19

The error happens more frequently if we run the Python application under NVIDIA
Nsight Systems, i.e. "srun --pty nsys profile --stats=true python
tests/parallel_trace_timing.py --gpu --nside 128 --perturbBC 1e-7". I find that
if we use a debug version of the LLVM OpenMP runtime library with tracing
information then a failing run aborts before there is any trace output. This
indicates to me that the error is happening at startup. It is possible that the
error is related to the change in the order of constructors/destructors
described in Bug ID #44933 . Do you have any ideas?

Thanks,
Chris

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200306/17c0cdbe/attachment.html>