<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Segfault in omp_parallel_reduction.c test (AArch64)"
href="https://bugs.llvm.org/show_bug.cgi?id=44112">44112</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Segfault in omp_parallel_reduction.c test (AArch64)
</td>
</tr>
<tr>
<th>Product</th>
<td>OpenMP
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>Other
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Runtime Library
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>graham.hunter@arm.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>I observed a segmentation fault in the parallel reduction test when running
'make check-openmp'. I took a look through the resulting core file and found
the crash was in the generated reduction function:
<span class="quote">> #0 0x000000000040179c in .omp.reduction.reduction_func ()
> #1 0x0000ffff8fe95320 in __kmp_barrier () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #2 0x0000ffff8fe65c10 in __kmpc_reduce_nowait () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> #3 0x00000000004016ac in .omp_outlined. ()
> #4 0x0000ffff8fed09ac in __kmp_invoke_microtask () from /home/grahun01/Build/hpc-dev/lib/libomp.so
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)</span >
The test compiles with -O0, so there's a few additional loads/stores in the
disassembly:
<span class="quote">> Dump of assembler code for function .omp.reduction.reduction_func.6:
> 0x000000000040177c <+0>: sub sp, sp, #0x10
> 0x0000000000401780 <+4>: str x0, [sp, #8]
> 0x0000000000401784 <+8>: str x1, [sp]
> 0x0000000000401788 <+12>: ldr x8, [sp, #8]
> 0x000000000040178c <+16>: ldr x9, [sp]
> 0x0000000000401790 <+20>: ldr x9, [x9]
> 0x0000000000401794 <+24>: ldr x8, [x8]
> 0x0000000000401798 <+28>: ldr d0, [x8]
> => 0x000000000040179c <+32>: ldr d1, [x9]
> 0x00000000004017a0 <+36>: fadd d0, d0, d1
> 0x00000000004017a4 <+40>: str d0, [x8]
> 0x00000000004017a8 <+44>: add sp, sp, #0x10
> 0x00000000004017ac <+48>: ret</span >
The parameters passed into this function come from one of the kmp_barrier
variants:
<span class="quote">> (*reduce)(this_thr->th.th_local.reduce_data,
> child_thr->th.th_local.reduce_data);</span >
The pointer in x9 at the time of the fault was 0.
My guess is that there's a data race between a child thread writing a pointer
into its reduce_data variable and the thread performing the reduction trying to
read it. A run with TSan confirms this, but as it almost always works (I tried
~100K times to reproduce on different machines without success) I suspect TSan
isn't detecting the method that's supposed to synchronize access.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>