[cfe-commits] patch: libcxxabi cxa_*_virtual and cxa_guard_* methods
Nick Lewycky
nlewycky at google.com
Mon May 23 14:59:52 PDT 2011
On 23 May 2011 00:22, John McCall <rjmccall at apple.com> wrote:
>
> On May 22, 2011, at 7:10 PM, Howard Hinnant wrote:
>
> > On May 22, 2011, at 7:42 PM, John McCall wrote:
> >
> >> On May 22, 2011, at 4:28 PM, Howard Hinnant wrote:
> >>
> >>> Question for clang developers:
> >>>
> >>> Will these namespace-scope definitions ever generate a call to
> __cxa_guard_acquire? I know this is forward looking concerning
> thread_local:
> >>
> >> The only variable definitions that require _cxa_guard_acquire are static
> locals and static data members of templated classes.
> >
> > Ok, thanks John.
> >
> > I've got a prototype here. It is kind of messy. But it is basically
> derived from Nick's code which I think is a good fundamental design. Though
> I've rejected spin locks. It would work best with thread_local data. But
> for obvious reasons I haven't tested that branch and am emulating
> thread_local with pthread_key_t. This implementation is also sensitive to
> endian because the Itanium spec is in terms of "first byte" instead of "high
> byte".
> >
> > In a nutshell, I develop a thread id that can be stored in size_t. An
> assumption is that the number of threads in the lifetime of an application
> will be less than 2^32 on a 32 bit architecture, and less than 2^56 on a 64
> bit architecture. This thread id is always non-zero, and is nothing more
> than a sequential count of threads. Note that it is not a count of
> currently active threads, but a count of the number of times a thread is
> created. pthread_self() is specifically not used for this purpose because
> it often takes up more storage space than the 56 bits we have available.
>
> A thread-local with a mandatory initializer seems like a lot of overhead
> for a QoI thing like aborting on a recursive initialization — I dunno, maybe
> I'm over-thinking things. I do think that it would be good to allow
> platforms to opt out, or at least to provide a different implementation if
> the platform happens to support a range-restricted TID.
>
> In particular, while Darwin's pthread_t is a pointer, Darwin does provide
> mach_thread_self(), which returns a mach_port_t, which is (ultimately) an
> unsigned int. This is the sort of thing that most OSes probably have an
> equivalent of. For example, on Linux (maybe only certain distributions?),
> you can use gettid(), which is a signed int.
>
Better than gettid(), Linux's pthreads implementation is based on futex()
which provides a way to implement a condition variable on an int*. I'd be
willing to try my hand at implementing guards with futex(), but won't get
around to it soon.
Just to make your life more complicated, guard variables on ARM are only 32
> bits, but they give the implementation 31 bits to play with: the compiler's
> fast-path check only considers whether the low *bit* is non-zero.
>
Yup. This. I wanted to bring this up, but thought I should have some code
contributed before stirring it up. :)
Is libcxxabi interested in providing a home for ARM ABI as well as Itanium
ABI? And if so, what should we do to share code? On ARM we need to provide
many/all of the Itanium ABI functions but compilers are expected to call the
ARM ABI ones instead. Their ABI manual suggests implementing the Itanium
functions in terms of the ARM ones instead of the other way around. How
would we go about doing that?
Nick
I don't think you need to worry about platforms that use the Itanium ABI but
> that don't have pthreads. :) Undoubtedly someone will correct me, though.
>
> > An assumption (and clang developers will have to tell me if it is correct
> or not) is that the compiler will do a proper double-checked locking dance
> prior to calling __cxa_guard_acquire.
>
> I'm not sure what code you're imagining that the compiler might emit here;
> if the compiler emitted a full and proper double-checked locking dance, it
> would not need to call __cxa_guard_acquire. The compiler has the option of
> emitting an unprotected check that the first byte is non-zero, subject only
> to the restriction that accesses to the protected global cannot be
> re-ordered around that check. It's not required to do any such thing,
> though.
>
> In short, for a multitude of reasons, __cxa_guard_acquire must be prepared
> for the possibility that the variable has already been initialized by the
> time it grabs the lock.
>
> That said, it's probably best to assume that the compiler did a preliminary
> check and that it's not worth performing a second such check before trying
> to grab the lock.
>
> > Comments, suggestions welcome.
>
> Pedantic point of the week: 'initialized' should be specifically a char*,
> not an int8_t*. int8_t may be 'signed char', which in C++ does not have the
> omnipotent aliasing power of 'char' or 'unsigned char'.
>
> Also, I don't see a reason to clear the "lock" on cxa_guard_release, and on
> every other path, set_lock can safely assume that the first byte of the
> guard is zero.
>
> But this looks like a great start!
>
> John.
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110523/1fde56bf/attachment.html>
More information about the cfe-commits
mailing list