<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - std::this_thread::sleep_until may block forever"
href="https://bugs.llvm.org/show_bug.cgi?id=43838">43838</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>std::this_thread::sleep_until may block forever
</td>
</tr>
<tr>
<th>Product</th>
<td>libc++
</td>
</tr>
<tr>
<th>Version</th>
<td>7.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>All Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>philip@kodiak.ai
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, mclow.lists@gmail.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=22742" name="attach_22742" title="Example code to hit the race condition. May need adjustment of loop_delay. Compile with e.g. clang++ -o test -O2 -stdlib=libc++ main.cpp on a machine with an affected libcxx version.">attachment 22742</a> <a href="attachment.cgi?id=22742&action=edit" title="Example code to hit the race condition. May need adjustment of loop_delay. Compile with e.g. clang++ -o test -O2 -stdlib=libc++ main.cpp on a machine with an affected libcxx version.">[details]</a></span>
Example code to hit the race condition. May need adjustment of loop_delay.
Compile with e.g. clang++ -o test -O2 -stdlib=libc++ main.cpp on a machine with
an affected libcxx version.
There is a race condition in condition_variable::wait_until in __mutex_base
that that can result in an infinite wait.
Potentially all released libcxx versions are affected.
The issue surfaces quite easily (and often) when using high resolution
(nanosecond resolution) clocks and made us ban all uses of standard library
functions that can hit the code in question.
Code in question is (__mutex_base, version 7.0):
378 template <class _Clock, class _Duration>
379 cv_status
380 condition_variable::wait_until(unique_lock<mutex>& __lk,
381 const chrono::time_point<_Clock, _Duration>&
__t)
382 {
383 using namespace chrono;
384 wait_for(__lk, __t - _Clock::now());
385 return _Clock::now() < __t ? cv_status::no_timeout :
cv_status::timeout;
386 }
where, depending on the clock representation, an underflow can occur in line
384 (time is advancing, and if __t was too close to the current time point, the
underflow will hit). Seems like all released version (up to 9.0) have this
piece of code. There is a commit on master that will prevent the underflow
happening that's not yet in the released versions.
Example stack traces based on the example in the attachment:
(lldb) bt
* thread #1, name = 'test', stop reason = signal SIGSTOP
frame #0: 0x00007fb5875bff85 libpthread.so.0`__pthread_cond_timedwait at
futex-internal.h:205
frame #1: 0x00007fb5875bff60 libpthread.so.0`__pthread_cond_timedwait at
pthread_cond_wait.c:539
frame #2: 0x00007fb5875bfe00
libpthread.so.0`__pthread_cond_timedwait(cond=0x00007ffdcca62780,
mutex=0x00007ffdcca627b0, abstime=0x00007ffdcca61a50) at
pthread_cond_wait.c:667
frame #3: 0x00007fb5883e7f15
libc++.so.1`std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&,
std::__1::chrono::time_point<std::__1::chrono::system_clock,
std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) +
101
* frame #4: 0x00000000004036da test`std::__1::cv_status
std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l,
1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770,
__d=0x00007ffdcca624b8) at __mutex_base:418
frame #5: 0x000000000040286c test`std::__1::cv_status
std::__1::condition_variable::wait_until<tai_clock,
std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l>
<span class="quote">> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770, __t=0x00007ffdcca629a0)</span >
at __mutex_base:384
frame #6: 0x0000000000401a20 test`void
std::__1::this_thread::sleep_until<tai_clock,
std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l>
<span class="quote">> >(__t=0x00007ffdcca629a0) at thread:461</span >
frame #7: 0x00000000004015a5 test`main(argc=1, argv=0x00007ffdcca62c88) at
main.cpp:41
frame #8: 0x00007fb5877f2b97 libc.so.6`__libc_start_main(main=(test`main at
main.cpp:36), argc=1, argv=0x00007ffdcca62c88, init=<unavailable>,
fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007ffdcca62c78) at
libc-start.c:310
frame #9: 0x000000000040113a test`_start + 42
With the following stack frame variables (in wait_for, called by wait_until):
(lldb) frame info
frame #4: 0x00000000004036da test`std::__1::cv_status
std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l,
1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770,
__d=0x00007ffdcca624b8) at __mutex_base:418
(lldb) frame variable
(std::__1::condition_variable *) this = 0x00007ffdcca62780
(std::__1::unique_lock<std::__1::mutex> &) __lk = 0x00007ffdcca62770: {
__m_ = 0x00007ffdcca627b0
__owns_ = true
}
(const std::__1::chrono::duration<unsigned long long, std::__1::ratio<1,
1000000000> > &) __d = 0x00007ffdcca624b8 (__rep_ = 18446744073709549768)
(__sys_tpf) _Max = {
__d_ = (__rep_ = 9223372036854775807)
}
(std::__1::chrono::steady_clock::time_point) __c_now = {
__d_ = (__rep_ = 141408845845668)
}
(std::__1::chrono::system_clock::time_point) __s_now = {
__d_ = (__rep_ = 1572277999316767)
}
This occurs quite frequently in our setup that uses a nanosecond resolution
clock with a uint64_t representation type (see example code) and we had to ban
all uses of functions that can hit the wait_until eventually.
The standard system_clock/steady_clock looks to be a microsecond resolution
clock. For those clocks the probability of the underflow happening is pretty
low (but is non-zero).</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>