[llvm-bugs] [Bug 44112] New: Segfault in omp_parallel_reduction.c test (AArch64)

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Nov 22 02:20:08 PST 2019


https://bugs.llvm.org/show_bug.cgi?id=44112

            Bug ID: 44112
           Summary: Segfault in omp_parallel_reduction.c test (AArch64)
           Product: OpenMP
           Version: unspecified
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Runtime Library
          Assignee: unassignedbugs at nondot.org
          Reporter: graham.hunter at arm.com
                CC: llvm-bugs at lists.llvm.org

I observed a segmentation fault in the parallel reduction test when running
'make check-openmp'. I took a look through the resulting core file and found
the crash was in the generated reduction function:

> #0  0x000000000040179c in .omp.reduction.reduction_func ()
> #1  0x0000ffff8fe95320 in __kmp_barrier () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #2  0x0000ffff8fe65c10 in __kmpc_reduce_nowait () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #3  0x00000000004016ac in .omp_outlined. ()
> #4  0x0000ffff8fed09ac in __kmp_invoke_microtask () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The test compiles with -O0, so there's a few additional loads/stores in the
disassembly:

> Dump of assembler code for function .omp.reduction.reduction_func.6:
>    0x000000000040177c <+0>:     sub     sp, sp, #0x10
>    0x0000000000401780 <+4>:     str     x0, [sp, #8]
>    0x0000000000401784 <+8>:     str     x1, [sp]
>    0x0000000000401788 <+12>:    ldr     x8, [sp, #8]
>    0x000000000040178c <+16>:    ldr     x9, [sp]
>    0x0000000000401790 <+20>:    ldr     x9, [x9]
>    0x0000000000401794 <+24>:    ldr     x8, [x8]
>    0x0000000000401798 <+28>:    ldr     d0, [x8]
> => 0x000000000040179c <+32>:    ldr     d1, [x9]
>    0x00000000004017a0 <+36>:    fadd    d0, d0, d1
>    0x00000000004017a4 <+40>:    str     d0, [x8]
>    0x00000000004017a8 <+44>:    add     sp, sp, #0x10
>    0x00000000004017ac <+48>:    ret

The parameters passed into this function come from one of the kmp_barrier
variants:

>            (*reduce)(this_thr->th.th_local.reduce_data,
>                      child_thr->th.th_local.reduce_data);

The pointer in x9 at the time of the fault was 0.

My guess is that there's a data race between a child thread writing a pointer
into its reduce_data variable and the thread performing the reduction trying to
read it. A run with TSan confirms this, but as it almost always works (I tried
~100K times to reproduce on different machines without success) I suspect TSan
isn't detecting the method that's supposed to synchronize access.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191122/a5a91993/attachment.html>


More information about the llvm-bugs mailing list