[llvm-bugs] [Bug 44112] New: Segfault in omp_parallel_reduction.c test (AArch64)
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Nov 22 02:20:08 PST 2019
https://bugs.llvm.org/show_bug.cgi?id=44112
Bug ID: 44112
Summary: Segfault in omp_parallel_reduction.c test (AArch64)
Product: OpenMP
Version: unspecified
Hardware: Other
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Runtime Library
Assignee: unassignedbugs at nondot.org
Reporter: graham.hunter at arm.com
CC: llvm-bugs at lists.llvm.org
I observed a segmentation fault in the parallel reduction test when running
'make check-openmp'. I took a look through the resulting core file and found
the crash was in the generated reduction function:
> #0 0x000000000040179c in .omp.reduction.reduction_func ()
> #1 0x0000ffff8fe95320 in __kmp_barrier () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #2 0x0000ffff8fe65c10 in __kmpc_reduce_nowait () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #3 0x00000000004016ac in .omp_outlined. ()
> #4 0x0000ffff8fed09ac in __kmp_invoke_microtask () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
The test compiles with -O0, so there's a few additional loads/stores in the
disassembly:
> Dump of assembler code for function .omp.reduction.reduction_func.6:
> 0x000000000040177c <+0>: sub sp, sp, #0x10
> 0x0000000000401780 <+4>: str x0, [sp, #8]
> 0x0000000000401784 <+8>: str x1, [sp]
> 0x0000000000401788 <+12>: ldr x8, [sp, #8]
> 0x000000000040178c <+16>: ldr x9, [sp]
> 0x0000000000401790 <+20>: ldr x9, [x9]
> 0x0000000000401794 <+24>: ldr x8, [x8]
> 0x0000000000401798 <+28>: ldr d0, [x8]
> => 0x000000000040179c <+32>: ldr d1, [x9]
> 0x00000000004017a0 <+36>: fadd d0, d0, d1
> 0x00000000004017a4 <+40>: str d0, [x8]
> 0x00000000004017a8 <+44>: add sp, sp, #0x10
> 0x00000000004017ac <+48>: ret
The parameters passed into this function come from one of the kmp_barrier
variants:
> (*reduce)(this_thr->th.th_local.reduce_data,
> child_thr->th.th_local.reduce_data);
The pointer in x9 at the time of the fault was 0.
My guess is that there's a data race between a child thread writing a pointer
into its reduce_data variable and the thread performing the reduction trying to
read it. A run with TSan confirms this, but as it almost always works (I tried
~100K times to reproduce on different machines without success) I suspect TSan
isn't detecting the method that's supposed to synchronize access.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191122/a5a91993/attachment.html>
More information about the llvm-bugs
mailing list