[cfe-commits] patch: libcxxabi cxa_*_virtual and cxa_guard_* methods
Howard Hinnant
hhinnant at apple.com
Sun May 22 19:10:56 PDT 2011
On May 22, 2011, at 7:42 PM, John McCall wrote:
> On May 22, 2011, at 4:28 PM, Howard Hinnant wrote:
>
>> Question for clang developers:
>>
>> Will these namespace-scope definitions ever generate a call to __cxa_guard_acquire? I know this is forward looking concerning thread_local:
>
> The only variable definitions that require _cxa_guard_acquire are static locals and static data members of templated classes.
Ok, thanks John.
I've got a prototype here. It is kind of messy. But it is basically derived from Nick's code which I think is a good fundamental design. Though I've rejected spin locks. It would work best with thread_local data. But for obvious reasons I haven't tested that branch and am emulating thread_local with pthread_key_t. This implementation is also sensitive to endian because the Itanium spec is in terms of "first byte" instead of "high byte".
In a nutshell, I develop a thread id that can be stored in size_t. An assumption is that the number of threads in the lifetime of an application will be less than 2^32 on a 32 bit architecture, and less than 2^56 on a 64 bit architecture. This thread id is always non-zero, and is nothing more than a sequential count of threads. Note that it is not a count of currently active threads, but a count of the number of times a thread is created. pthread_self() is specifically not used for this purpose because it often takes up more storage space than the 56 bits we have available.
The thread id is stored in the guard_variable while that thread is initializing the guarded variable, and otherwise 0 is stored in those 56 bits of the guard variable. This is used both to state that the variable is in the process of initialization (Nick's lock to block other threads), and to detect recursive initialization, in which case we abort instead of hang.
An assumption (and clang developers will have to tell me if it is correct or not) is that the compiler will do a proper double-checked locking dance prior to calling __cxa_guard_acquire. I.e. __cxa_guard_acquire immediately locks a mutex. If this assumption is wrong, then __cxa_guard_acquire could do the double checked locking. But the correct way to do that is platform dependent and can't be done portably without C++'s <atomic> header which is in turn platform dependent in its implementation. So I don't see a big win in lowering the double checked locking into __cxa_guard_acquire.
Oh, another major assumption is that pthreads is there. If that assumption is wrong (Windows?) then it might be best to just put a giant #if around the whole thing. And if we're targeting a non-multithreaded platform, that could be a very easily implementable 3rd branch in the outer #if.
Lightly tested.
Comments, suggestions welcome.
Howard
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
namespace __cxxabiv1
{
namespace __libcxxabi
{
void abort(const char* s) {printf("%s\n", s); ::abort();}
} // __libcxxabi
namespace
{
pthread_mutex_t guard_mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t guard_cv = PTHREAD_COND_INITIALIZER;
size_t next_id;
#if __LITTLE_ENDIAN__
# define ID_INCR 256
#else
# define ID_INCR 1
#endif
#if __has_feature(cxx_thread_local)
thread_local const size_t id = __sync_fetch_and_add(&next_id, ID_INCR) + ID_INCR;
#else // __has_feature(cxx_thread_local)
pthread_key_t id_key;
int create_id = pthread_key_create(&id_key, 0);
#endif // __has_feature(cxx_thread_local)
#if __LITTLE_ENDIAN__
inline
int64_t
get_lock(int64_t x)
{
return x & 0xFFFFFFFFFFFFFF00;
}
inline
void
set_lock(int64_t& x, int64_t y)
{
x &= 0x00000000000000FF;
x |= (y & 0xFFFFFFFFFFFFFF00);
}
#else // __LITTLE_ENDIAN__
inline
int64_t
get_lock(int64_t x)
{
return x & 0x00FFFFFFFFFFFFFF;
}
inline
void
set_lock(int64_t& x, int64_t y)
{
x &= 0x00FFFFFFFFFFFFFF;
x |= (y & 0x00FFFFFFFFFFFFFFFF);
}
#endif // __LITTLE_ENDIAN__
} // unnamed namespace
extern "C"
{
int __cxa_guard_acquire(int64_t* guard_object)
{
int8_t* initialized = (int8_t*)guard_object;
if (pthread_mutex_lock(&guard_mut))
__libcxxabi::abort("__cxa_guard_acquire failed to acquire mutex");
#if !__has_feature(cxx_thread_local)
size_t id = (size_t)pthread_getspecific(id_key);
if (id == 0)
{
next_id += ID_INCR;
id = next_id;
pthread_setspecific(id_key, (const void*)id);
}
#endif
int64_t lock = get_lock(*guard_object);
if (lock)
{
// if this thread set the lock for this same guard_object, abort
if (lock == id)
__libcxxabi::abort("__cxa_guard_acquire detected deadlock");
do
{
if (pthread_cond_wait(&guard_cv, &guard_mut))
__libcxxabi::abort("__cxa_guard_acquire condition variable wait failed");
lock = get_lock(*guard_object);
} while (lock);
}
int result = *initialized == 0;
if (result)
set_lock(*guard_object, id);
if (pthread_mutex_unlock(&guard_mut))
__libcxxabi::abort("__cxa_guard_acquire failed to release mutex");
return result;
}
void __cxa_guard_release(int64_t* guard_object)
{
uint8_t* initialized = (uint8_t*)guard_object;
if (pthread_mutex_lock(&guard_mut))
__libcxxabi::abort("__cxa_guard_release failed to acquire mutex");
*initialized = 1;
set_lock(*guard_object, 0);
if (pthread_mutex_unlock(&guard_mut))
__libcxxabi::abort("__cxa_guard_release failed to release mutex");
if (pthread_cond_broadcast(&guard_cv))
__libcxxabi::abort("__cxa_guard_release failed to broadcast condition variable");
}
void __cxa_guard_abort(int64_t* guard_object)
{
if (pthread_mutex_lock(&guard_mut))
__libcxxabi::abort("__cxa_guard_abort failed to acquire mutex");
set_lock(*guard_object, 0);
if (pthread_mutex_unlock(&guard_mut))
__libcxxabi::abort("__cxa_guard_abort failed to release mutex");
if (pthread_cond_broadcast(&guard_cv))
__libcxxabi::abort("__cxa_guard_abort failed to broadcast condition variable");
}
} // extern "C"
} // __cxxabiv1
More information about the cfe-commits
mailing list