[cfe-dev] [libc++] r160604 appears to have broken libc++ on linux

Andrew C. Morrow andrew.c.morrow at gmail.com
Tue Jul 24 12:15:43 PDT 2012


On Sun, Jul 22, 2012 at 8:42 PM, Howard Hinnant <hhinnant at apple.com> wrote:
> On Jul 22, 2012, at 7:12 PM, "Andrew C. Morrow" <andrew.c.morrow at gmail.com> wrote:
>
>> I have a mostly working clang toolchain on Linux using libc++abi and
>> libc++. But today I pulled the latest updates to libc++ and rebuilt
>> it, and now any attempt to construct a std::stringstream fails.
>>
>> #include <sstream>
>>
>> int main(int argc, char* argv[]) {
>>    std::stringstream ss;
>>    return EXIT_SUCCESS;
>> }
>>
>> This dies with an unhandled std::bad_cast exception:
>>
>> terminating with uncaught exception of type std::bad_cast: std::bad_cast
>>
>> Program received signal SIGABRT, Aborted.
>> 0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> 64    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) where
>> #0  0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #1  0x00007ffff75f36f0 in *__GI_abort () at abort.c:92
>> #2  0x00007ffff7fcd457 in abort_message (format=<optimized out>) at
>> ../../../src/libcxxabi/src/abort_message.cpp:47
>> #3  0x00007ffff7fcd6c2 in default_handler (cause=0x7ffff7fe3ba3
>> "uncaught") at ../../../src/libcxxabi/src/cxa_default_handlers.cpp:61
>> #4  0x00007ffff7fcd54d in default_terminate_handler () at
>> ../../../src/libcxxabi/src/cxa_default_handlers.cpp:81
>> #5  0x00007ffff7fe0676 in std::__terminate (func=0xeed9) at
>> ../../../src/libcxxabi/src/cxa_handlers.cpp:67
>> #6  0x00007ffff7fdfce6 in failed_throw (exception_header=<optimized
>> out>) at ../../../src/libcxxabi/src/cxa_exception.cpp:147
>> #7  __cxa_throw (thrown_object=0x406090, tinfo=<optimized out>,
>> dest=<optimized out>) at
>> ../../../src/libcxxabi/src/cxa_exception.cpp:242
>> #8  0x00007ffff7e8f857 in std::__1::locale::__imp::use_facet
>> (this=0x7ffff7faba70, id=28) at
>> /home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/src/locale.cpp:388
>> #9  0x00007ffff7e90633 in std::__1::locale::use_facet
>> (this=0x7fffffffe540, x=...) at
>> /home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/src/locale.cpp:530
>> #10 0x0000000000401e8a in init (this=0x7fffffffe310,
>> __sb=0x7fffffffe2a8) at /home/acm/opt/include/c++/v1/ios:660
>> #11 basic_istream (this=0x0, vtt=0x7fffffffe660, __sb=0x7fffffffe2a8,
>> this=0x0, vtt=0x7fffffffe660) at
>> /home/acm/opt/include/c++/v1/istream:294
>> #12 basic_ios (this=0x7ffff7e109c0, this=0x7ffff7e109c0, vtt=0x404788,
>> __sb=0x7fffffffe2a8, this=0x7ffff7e109c0, __wch=32767) at
>> /home/acm/opt/include/c++/v1/istream:1488
>> #13 basic_stringstream (this=0x7fffffffe290, __wch=24) at
>> /home/acm/opt/include/c++/v1/sstream:809
>> #14 main (argc=1, argv=0x7fffffffe668) at ./test.cpp:26
>> (gdb)
>>
>> In addition to my trivial test case, many of the libc++abi and libc++
>> unit tests fail with similar exceptions when r160604 is applied.
>>
>> If I revert libc++ back to r160594 things start working again.
>>
>> The rest of toolchain was built with the following component revisions:
>> llvm: r160611
>> clang: r160613
>> libc++abi: r160553
>>
>> It is not immediately obvious to me how r160604's noexcept and
>> constexpr changes to std::mutex could cause this. Valgrind didn't have
>> anything interesting to say. Any suggestions about where to start
>> looking?
>
> It isn't obvious to me either.
>
> In locale.cpp, this throw is happening:
>
> const locale::facet*
> locale::__imp::use_facet(long id) const
> {
> #ifndef _LIBCPP_NO_EXCEPTIONS
>     if (!has_facet(id))
>         throw bad_cast();
> #endif  // _LIBCPP_NO_EXCEPTIONS
>     return facets_[static_cast<size_t>(id)];
> }
>
> But I have no idea why you would be throwing now, and not without the noexcept declarations on mutex et al. in r160594.  I'm not replicating this on Mac OS X.
>
> Here's has_facet:
>
>     bool has_facet(long id) const
>         {return static_cast<size_t>(id) < facets_.size() && facets_[static_cast<size_t>(id)];}
>
> which is just range checking the vector of facet pointers which should be constructed by now.
>
> This looks to be happening while default constructing a locale.  But default constructing a locale should not be calling use_facet or has_facet.  This stack trace should be diving into make_global and make_classic in locale.cpp, which call this constructor:
>
> locale::__imp::__imp(size_t refs)
>
> which does not lead to use_facet or has_facet.
>
> So it looks like some kind of corruption going on somewhere.
>
> Howard
>

Hi Howard -

Thank you for your help investigating this. I reduced the test case a bit:

#include <istream>

int main(int argc, char* argv[]) {
    std::istream is(NULL);
    return EXIT_SUCCESS;
}

and rebuilt everything with all types of optimization turned off, and
I have a more understandable stack trace now, and something that
suggests how r160604 is causing trouble for me.

The following stack trace is from an interactive step through of the
above program, compiled with -stdlib=libc++, and paused right before
has_facet returns false, which in turn will make use_facet throw
bad_cast. Here, 'has_facet' will return false because 'id' is not less
than facets_.size():

(gdb) up
#1  0x00007ffff7f4fa7f in std::__1::locale::__imp::use_facet
(this=0x7ffff7ff6810, id=28) at libcxx/src/locale.cpp:387
387	    if (!has_facet(id))
(gdb) down
#0  std::__1::locale::__imp::has_facet (this=0x7ffff7ff6810, id=28) at
libcxx/src/locale.cpp:106
106	        {return static_cast<size_t>(id) < facets_.size() &&
facets_[static_cast<size_t>(id)];}
(gdb) where
#0  std::__1::locale::__imp::has_facet (this=0x7ffff7ff6810, id=28) at
libcxx/src/locale.cpp:106
#1  0x00007ffff7f4fa7f in std::__1::locale::__imp::use_facet
(this=0x7ffff7ff6810, id=28) at libcxx/src/locale.cpp:387
#2  0x00007ffff7f50533 in std::__1::locale::use_facet
(this=0x7fffffffe428, x=...) at libcxx/src/locale.cpp:530
#3  0x00007ffff7f44a9c in std::__1::use_facet<std::__1::ctype<char> >
(__l=...) at libcxx/include/__locale:164
#4  0x00007ffff7f8c63c in std::__1::basic_ios<char,
std::__1::char_traits<char> >::widen (this=0x7fffffffe4d8, __c=32 ' ')
    at libcxx/include/ios:725
#5  0x00007ffff7f8c183 in std::__1::basic_ios<char,
std::__1::char_traits<char> >::init (this=0x7fffffffe4d8, __sb=0x0)
    at libcxx/include/ios:661
#6  0x00007ffff7f8eddf in std::__1::basic_istream<char,
std::__1::char_traits<char> >::basic_istream (this=0x7fffffffe4c8,
__sb=0x0)
    at libcxx/include/istream:294
#7  0x00000000004008d5 in main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
(gdb) print id
$25 = 28
(gdb) print facets_.size()
$26 = 28

So we get to the whole use_facet/has_facet call chain by way of
basic_ios<char>::widen which is explicitly called from basic_ios::init
to initialize the __fil_ member. So I think the stack trace is
plausible without any sort of corruption.

One thing I noticed while doing the above step through was that
locale::id::__get and locale::id::__init were involved in producing
the value passed to use_facet, and these functions use std::once_flag
which changed in r160604.

I rolled back to r160594, and hacked in some printfs to locale::id
__get and __init, and made once_flag.__state_ public:

long
locale::id::__get()
{
    printf("XXX locale::id::__get: before call_once: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    call_once(__flag_, __fake_bind(&locale::id::__init, this));

    printf("XXX locale::id::__get: after call_once: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    printf("XXX locale::id::__get: this(%x), will return: %ld\n",
this, __id_ - 1);

    return __id_ - 1;
}

void
locale::id::__init()
{
    printf("XXX locale::id::__init before increment: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    __id_ = __sync_add_and_fetch(&__next_id, 1);

    printf("XXX locale::id::__init: after increment: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

}

With a libc++ tree based on r160594, the calls to the above functions
after 'main' starts emit the following logs:

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
4	    std::istream is(NULL);
(gdb) c
Continuing.
XXX locale::id::__get: before call_once: this(401e20),
__flag_.__state_(18446744073709551615), __id_(3), __next_id(28)
XXX locale::id::__get: after call_once: this(401e20),
__flag_.__state_(18446744073709551615), __id_(3), __next_id(28)
XXX locale::id::__get: this(401e20), will return: 2
[Inferior 1 (process 16448) exited normally]


If I pull the r160604 updates however, the log looks like this:

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
4	    std::istream is(NULL);
(gdb) c
Continuing.
XXX locale::id::__get: before call_once: this(401e30),
__flag_.__state_(0), __id_(3), __next_id(28)
XXX locale::id::__init before increment: this(401e30),
__flag_.__state_(1), __id_(3), __next_id(28)
XXX locale::id::__init: after increment: this(401e30),
__flag_.__state_(1), __id_(29), __next_id(29)
XXX locale::id::__get: after call_once: this(401e30),
__flag_.__state_(18446744073709551615), __id_(29), __next_id(29)
XXX locale::id::__get: this(401e30), will return: 28
terminating with uncaught exception of type std::bad_cast: std::bad_cast

Program received signal SIGABRT, Aborted.
0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

There are many lines logged before main starts for both cases, and I
can provide those if it will be helpful, but the main difference
between r160594 and r160604 seems to be that after r160604 the state
flag is not set 'on' in the printfs after main starts, so __next_id
gets incremented when it shouldn't.

All of this of course could be because of some deeper level corruption
in my admittedly hacked up linux libc++ stack, but if you have any
suggestions for next steps I would appreciate it.

Thanks,
Andrew




More information about the cfe-dev mailing list