[cfe-dev] atomic intrinsics

Tue Oct 5 17:26:27 PDT 2010

On Oct 5, 2010, at 8:16 PM, Eric Christopher wrote:

> 
> On Oct 5, 2010, at 3:56 PM, Eric Christopher wrote:
> 
>> 
>> On Oct 5, 2010, at 9:50 AM, Howard Hinnant wrote:
>> 
>>> I'm still working on <atomic>, as described below.  But I paused my development to write a "Compiler writer's guide to <atomic>" which I've placed here:
>>> 
>>> http://libcxx.llvm.org/atomic_design.html
>>> 
>>> This details exactly what intrinsics must appear, and in what form, and which are optional.  This document also describes how the library deals with optional intrinsics which are not supplied.  In a nutshell, the library calls the best intrinsic the compiler supplies, and if none are, locks a mutex to do the job.
>>> 
>>> Comments welcome.  Is this a design that the clang community can rally around?
> 
> How about something like this:
> 
> __atomic_load_seq_cst(__obj)
> 
> that is processed by the front end into the llvm IR intrinsic (unless it has special knowledge) and then either emitted by the backend as a call to that function or inlined as atomic code if the backend knows how to do that.  That way the compiler can make the choice, but we also don't get people using the "intrinsics" in a non-portable way thinking that it's the actual api, and just use the API - and the front end knows what to do.
> 
> This would then get us instead of:
> 
> // load
> 
> template <class _Tp>
> _Tp
> __load_seq_cst(_Tp const volatile* __obj)
> {
>   unique_lock<mutex> _(__not_atomic_mut());
>   return *__obj;
> }
> 
> // load bool
> 
> inline _LIBCPP_INLINE_VISIBILITY
> bool
> __choose_load_seq_cst(bool const volatile* __obj)
> {
> #if __has_feature(__atomic_load_seq_cst_b)
>   return __atomic_load_seq_cst(__obj);
> #else
>   return __load_seq_cst(__obj);
> #endif
> }
> 
> Just a library call of __atomic_load_seq_cst that the backend will call if it doesn't know how to emit and so you only have to write:
> 
> // load
> 
> template <class _Tp>
> _Tp
> __atomic_load_seq_cst(_Tp const volatile* __obj)
> {
>   unique_lock<mutex> _(__not_atomic_mut());
>   return *__obj;
> }
> 
> and the front end would process it based on name and type depending on what it can do.  The backend can then implement 0, 1, N, or All of the intrinsics that can be lowered to target code.
> 
> You were mentioning a bit more to this in private mail, I'll let you summarize that here :)

<nod> Thanks Eric.  I should state right up front that I'm fine with this direction.  But it appears to me to need much more support from the front end (which is why I didn't propose it).  If we go this direction, there are no optional intrinsics for the front end.  The front end has to implement essentially everything specified in <atomic>.  Here is a list:

type: bool, char, signed char, unsigned char, short, unsigned short, int,
     unsigned int, long, unsigned long, long long, unsigned long long,
     char16_t, char32_t, wchar_t, void*

type __atomic_load_relaxed(const volatile type* atomic_obj);
type __atomic_load_consume(const volatile type* atomic_obj);
type __atomic_load_acquire(const volatile type* atomic_obj);
type __atomic_load_seq_cst(const volatile type* atomic_obj);

void __atomic_store_relaxed(volatile type* atomic_obj, type desired);
void __atomic_store_release(volatile type* atomic_obj, type desired);
void __atomic_store_seq_cst(volatile type* atomic_obj, type desired);

type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
type __atomic_exchange_release(volatile type* atomic_obj, type desired);
type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);

bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);

bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
                                                 type* expected, type desired);
bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
                                                 type* expected, type desired);

----

type: char, signed char, unsigned char, short, unsigned short, int,
     unsigned int, long, unsigned long, long long, unsigned long long,
     char16_t, char32_t, wchar_t

type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);

type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);

type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);

type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);

type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);

----

void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);

void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);

----

void __atomic_thread_fence_acquire();
void __atomic_thread_fence_release();
void __atomic_thread_fence_acq_rel();
void __atomic_thread_fence_seq_cst();

void __atomic_signal_fence_acquire();
void __atomic_signal_fence_release();
void __atomic_signal_fence_acq_rel();
void __atomic_signal_fence_seq_cst();

If desired, the front end could accept memory ordering arguments as ordinary parameters instead of encoding that information into the intrinsic name.  That would reduce the sheer number of intrinsics, and further simplify the complexity of the <atomic> header.  But it would mean that the fewer intrinsics that are left have more complex logic.  As an example:

bool atomic_compare_exchange_weak_explicit(volatile atomic_bool* __obj,
                                          bool* __exp, bool __desr,
                                          memory_order __s, memory_order __f);

This function currently translates the memory orderings __s and __f into compile-time information for the front end using a switch.  But that logic could also be moved into the intrinsic if that is desirable:

inline _LIBCPP_INLINE_VISIBILITY
bool atomic_compare_exchange_weak_explicit(volatile atomic_bool* __obj,
                                          bool* __exp, bool __desr,
                                          memory_order __s, memory_order __f)
{
   __f = __translate_memory_order(__f);
   switch (__s)
   {
   case memory_order_relaxed:
       return __choose_compare_exchange_weak_relaxed_relaxed(&__obj->__v_,
                                                             __exp, __desr);
   case memory_order_consume:
       switch (__f)
       {
       case memory_order_relaxed:
           return __choose_compare_exchange_weak_consume_relaxed(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_consume:
           return __choose_compare_exchange_weak_consume_consume(
                                                  &__obj->__v_, __exp, __desr);
       }
   case memory_order_acquire:
       switch (__f)
       {
       case memory_order_relaxed:
           return __choose_compare_exchange_weak_acquire_relaxed(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_consume:
           return __choose_compare_exchange_weak_acquire_consume(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_acquire:
           return __choose_compare_exchange_weak_acquire_acquire(
                                                  &__obj->__v_, __exp, __desr);
       }
   case memory_order_release:
       switch (__f)
       {
       case memory_order_relaxed:
           return __choose_compare_exchange_weak_release_relaxed(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_consume:
           return __choose_compare_exchange_weak_release_consume(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_acquire:
           return __choose_compare_exchange_weak_release_acquire(
                                                  &__obj->__v_, __exp, __desr);
       }
   case memory_order_acq_rel:
       switch (__f)
       {
       case memory_order_relaxed:
           return __choose_compare_exchange_weak_acq_rel_relaxed(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_consume:
           return __choose_compare_exchange_weak_acq_rel_consume(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_acquire:
           return __choose_compare_exchange_weak_acq_rel_acquire(
                                                  &__obj->__v_, __exp, __desr);
       }
   case memory_order_seq_cst:
       switch (__f)
       {
       case memory_order_relaxed:
           return __choose_compare_exchange_weak_seq_cst_relaxed(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_consume:
           return __choose_compare_exchange_weak_seq_cst_consume(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_acquire:
           return __choose_compare_exchange_weak_seq_cst_acquire(
                                                  &__obj->__v_, __exp, __desr);
       case memory_order_seq_cst:
           return __choose_compare_exchange_weak_seq_cst_seq_cst(
                                                  &__obj->__v_, __exp, __desr);
       }
   }
}

Or, I'm fine with just continuing with the minimal front-end investment design that I'm currently working on.  There is a lot of logic and functionality here, and the question is simply where we put the logic:  in the library or in the front end.  The current design burdens the library.  Eric is suggesting that we burden the front end with much of this logic, and I'm always ok for me to do less work. :-)

-Howard