[cfe-dev] atomic intrinsics

Tue Oct 5 16:23:08 PDT 2010

I actually did think about doing this by size, but decided the __atomic_* API was easier for me if I did it by type.  I agree that size is probably what the compiler writer is more concerned about.

Caveat:  There is a generalized atomic<T> template, which I haven't coded yet, but I was thinking about testing the size and pod-ness of T and reinterpreting T to a scalar when appropriate in order to get pair<void*,void*> lock-free when possible.  Though as you point out, on x86-64 that will never happen with my design.  Hmm... I'll think on this more.

-Howard

On Oct 5, 2010, at 7:15 PM, Jeffrey Yasskin wrote:

> It mostly looks good to me, but I wonder if the instrinsics should be
> organized by size rather than argument type. In particular, x86-64 can
> handle pair<void*,void*> atomically using cmpxchg16b, but there's no
> primitive type that large (unless you want to use an mmx type?).
> 
> On Tue, Oct 5, 2010 at 9:50 AM, Howard Hinnant <hhinnant at apple.com> wrote:
>> I'm still working on <atomic>, as described below.  But I paused my development to write a "Compiler writer's guide to <atomic>" which I've placed here:
>> 
>> http://libcxx.llvm.org/atomic_design.html
>> 
>> This details exactly what intrinsics must appear, and in what form, and which are optional.  This document also describes how the library deals with optional intrinsics which are not supplied.  In a nutshell, the library calls the best intrinsic the compiler supplies, and if none are, locks a mutex to do the job.
>> 
>> Comments welcome.  Is this a design that the clang community can rally around?
>> 
>> -Howard
>> 
>> On Oct 1, 2010, at 7:58 PM, Howard Hinnant wrote:
>> 
>>> I'm working on libc++'s <atomic>.  This header requires close cooperation with the compiler.  I set out to implement most of this header using only gcc's atomic intrinsics which clang already implements.  The experience is not satisfying. ;-)
>>> 
>>> The needs of <atomic> are great.  I've identified many intrinsics that need to optionally exist.  And their existence needs to be individually detectible.  If any individual intrinsic doesn't exist, <atomic> can lock a mutex and do the job.  But <atomic> needs to know how to ask the question:  Do you have this atomic intrinsic for this type? (type is integral or a void*).
>>> 
>>> The atomic intrinsics are basically:
>>> 
>>> load
>>> store
>>> exchange
>>> compare_exchange
>>> fetch_add
>>> fetch_sub
>>> fetch_or
>>> fetch_and
>>> fetch_sub
>>> fetch_xor
>>> 
>>> The first 4 must work on all integral types plus void*.  The arithmetic ones work on integral types except bool, and void* only supports fetch_add and fetch_sub.
>>> 
>>> The really complicating point is that these mostly support six different "memory orderings"
>>> 
>>> relaxed
>>> consume
>>> acquire
>>> release
>>> acq_rel
>>> seq_cst
>>> 
>>> (cleverly spelled to always take 6 chars ;-))  Some of the operations above only need to work with a subset of these orderings.  The compare_exchange comes in two flavors: strong and weak, and takes two orderings, not one.  One ordering for success, one for failure.  And only certain combinations.
>>> 
>>> The definitions of the orderings are here:
>>> 
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf
>>> 
>>> I thought about trying to summarize them here, but knew I would get it wrong.  I've put together a comprehensive list of intrinsics below, each specialized to an operation and to an ordering, or combination of orderings.  I've only included intrinsics below with "legal orderings".  The library can take care of detecting illegal memory orderings if that is desired.
>>> 
>>> I suggest that we take advantage of clang's __has_feature macro to detect if an intrinsic for a type exists.  For example if:
>>> 
>>>   bool __atomic_load_relaxed(const volatile bool* atomic_obj);
>>> 
>>> exists, I suggest that:
>>> 
>>> __has_feature(__atomic_load_relaxed_bool) returns true, else false.  Note that it is possible on some platforms that __has_feature(__atomic_load_relaxed_bool) might return true, but __has_feature(__atomic_load_relaxed_long_long) might return false.
>>> 
>>> Below is the list of intrinsics (holding breath).  Is this a direction that the clang community can rally around?
>>> 
>>> -Howard
>>> 
>>> ---
>>> 
>>> __has_feature(__atomic_<operation>_<memory ordering(s)>_<type>)
>>> 
>>> ---
>>> 
>>> type: bool, char, signed char, unsigned char, short, unsigned short, int,
>>>      unsigned int, long, unsigned long, long long, unsigned long long,
>>>      char16_t, char32_t, wchar_t, void*
>>> 
>>> // load returns value pointed to by atomic_obj
>>> type __atomic_load_relaxed(const volatile type* atomic_obj);
>>> type __atomic_load_consume(const volatile type* atomic_obj);
>>> type __atomic_load_acquire(const volatile type* atomic_obj);
>>> type __atomic_load_seq_cst(const volatile type* atomic_obj);
>>> 
>>> void __atomic_store_relaxed(volatile type* atomic_obj, type desired);
>>> void __atomic_store_release(volatile type* atomic_obj, type desired);
>>> void __atomic_store_seq_cst(volatile type* atomic_obj, type desired);
>>> 
>>> // exchange returns previous value of *atomic_obj
>>> type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
>>> type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
>>> type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
>>> type __atomic_exchange_release(volatile type* atomic_obj, type desired);
>>> type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
>>> type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);
>>> 
>>> // psuedo code for compare_exchange (weak and strong):
>>> //
>>> // bool
>>> // __atomic_compare_exchange_*(volatile type* atomic_obj, type* expected,
>>> //                             type desired)
>>> // {
>>> //     if (*atomic_obj == *expected)
>>> //     {
>>> //         *atomic_obj = desired;
>>> //         return true;
>>> //     }
>>> //     *expected = *atomic_obj;
>>> //     return false;
>>> // }
>>> //
>>> // __atomic_compare_exchange_S_F applies S memory ordering when returning true
>>> // and applies F memory ordering when returning false.
>>> //
>>> // "weak" is allowed to return false even if *atomic_obj == *expected
>>> // (spuriously, not always).  "strong" is not allowed to return false
>>> // spuriously.
>>> 
>>> bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
>>>                                                  type* expected, type desired);
>>> 
>>> ----
>>> 
>>> type: char, signed char, unsigned char, short, unsigned short, int,
>>>      unsigned int, long, unsigned long, long long, unsigned long long,
>>>      char16_t, char32_t, wchar_t
>>> 
>>> // All arithmetic operations return previous value of *atomic_obj
>>> 
>>> type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);
>>> 
>>> type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);
>>> 
>>> type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);
>>> 
>>> type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);
>>> 
>>> type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
>>> type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);
>>> 
>>> ----
>>> 
>>> // All arithmetic operations return previous value of *atomic_obj
>>> 
>>> void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
>>> 
>>> void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
>>> void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
>>> 
>>> 
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>