[llvm-bugs] [Bug 43838] New: std::this_thread::sleep_until may block forever

Tue Oct 29 10:28:12 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=43838

            Bug ID: 43838
           Summary: std::this_thread::sleep_until may block forever
           Product: libc++
           Version: 7.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: All Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: philip at kodiak.ai
                CC: llvm-bugs at lists.llvm.org, mclow.lists at gmail.com

Created attachment 22742
  --> https://bugs.llvm.org/attachment.cgi?id=22742&action=edit
Example code to hit the race condition. May need adjustment of loop_delay.
Compile with e.g. clang++ -o test -O2 -stdlib=libc++ main.cpp on a machine with
an affected libcxx version.

There is a race condition in condition_variable::wait_until in __mutex_base
that that can result in an infinite wait.
Potentially all released libcxx versions are affected.

The issue surfaces quite easily (and often) when using high resolution
(nanosecond resolution) clocks and made us ban all uses of standard library
functions that can hit the code in question.

Code in question is (__mutex_base, version 7.0):

378 template <class _Clock, class _Duration>                                    
379 cv_status                                                                   
380 condition_variable::wait_until(unique_lock<mutex>& __lk,                    
381                                const chrono::time_point<_Clock, _Duration>&
__t)                                                                            
382 {                                                                           
383     using namespace chrono;                                                 
384     wait_for(__lk, __t - _Clock::now());                                    
385     return _Clock::now() < __t ? cv_status::no_timeout :
cv_status::timeout;                                                             
386 }                                                                           

where, depending on the clock representation, an underflow can occur in line
384 (time is advancing, and if __t was too close to the current time point, the
underflow will hit). Seems like all released version (up to 9.0) have this
piece of code. There is a commit on master that will prevent the underflow
happening that's not yet in the released versions.

Example stack traces based on the example in the attachment:

(lldb) bt
* thread #1, name = 'test', stop reason = signal SIGSTOP
    frame #0: 0x00007fb5875bff85 libpthread.so.0`__pthread_cond_timedwait at
futex-internal.h:205
    frame #1: 0x00007fb5875bff60 libpthread.so.0`__pthread_cond_timedwait at
pthread_cond_wait.c:539
    frame #2: 0x00007fb5875bfe00
libpthread.so.0`__pthread_cond_timedwait(cond=0x00007ffdcca62780,
mutex=0x00007ffdcca627b0, abstime=0x00007ffdcca61a50) at
pthread_cond_wait.c:667
    frame #3: 0x00007fb5883e7f15
libc++.so.1`std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&,
std::__1::chrono::time_point<std::__1::chrono::system_clock,
std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) +
101
  * frame #4: 0x00000000004036da test`std::__1::cv_status
std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l,
1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770,
__d=0x00007ffdcca624b8) at __mutex_base:418
    frame #5: 0x000000000040286c test`std::__1::cv_status
std::__1::condition_variable::wait_until<tai_clock,
std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l>
> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770, __t=0x00007ffdcca629a0)
at __mutex_base:384
    frame #6: 0x0000000000401a20 test`void
std::__1::this_thread::sleep_until<tai_clock,
std::__1::chrono::duration<unsigned long long, std::__1::ratio<1l, 1000000000l>
> >(__t=0x00007ffdcca629a0) at thread:461
    frame #7: 0x00000000004015a5 test`main(argc=1, argv=0x00007ffdcca62c88) at
main.cpp:41
    frame #8: 0x00007fb5877f2b97 libc.so.6`__libc_start_main(main=(test`main at
main.cpp:36), argc=1, argv=0x00007ffdcca62c88, init=<unavailable>,
fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007ffdcca62c78) at
libc-start.c:310
    frame #9: 0x000000000040113a test`_start + 42

With the following stack frame variables (in wait_for, called by wait_until):

(lldb) frame info
frame #4: 0x00000000004036da test`std::__1::cv_status
std::__1::condition_variable::wait_for<unsigned long long, std::__1::ratio<1l,
1000000000l> >(this=0x00007ffdcca62780, __lk=0x00007ffdcca62770,
__d=0x00007ffdcca624b8) at __mutex_base:418

(lldb) frame variable
(std::__1::condition_variable *) this = 0x00007ffdcca62780
(std::__1::unique_lock<std::__1::mutex> &) __lk = 0x00007ffdcca62770: {
  __m_ = 0x00007ffdcca627b0
  __owns_ = true
}
(const std::__1::chrono::duration<unsigned long long, std::__1::ratio<1,
1000000000> > &) __d = 0x00007ffdcca624b8 (__rep_ = 18446744073709549768)
(__sys_tpf) _Max = {
  __d_ = (__rep_ = 9223372036854775807)
}
(std::__1::chrono::steady_clock::time_point) __c_now = {
  __d_ = (__rep_ = 141408845845668)
}
(std::__1::chrono::system_clock::time_point) __s_now = {
  __d_ = (__rep_ = 1572277999316767)
}

This occurs quite frequently in our setup that uses a nanosecond resolution
clock with a uint64_t representation type (see example code) and we had to ban
all uses of functions that can hit the wait_until eventually.

The standard system_clock/steady_clock looks to be a microsecond resolution
clock. For those clocks the probability of the underflow happening is pretty
low (but is non-zero).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191029/69716e6a/attachment.html>