[cfe-dev] atomic intrinsics

Thu May 27 06:36:15 PDT 2010

Thanks much for the very detailed answer Jeffrey!

Are there any changes to the C++0X working draft:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf

that you believe need to be made in this area in order to significantly increase the quality of the llvm/clang implementation?  Personally I'm wondering about ways to ensure that the memory order is a compile-time constant.  This question has a tight deadline.  I'm turning in national body comments on the C++0X FCD in about 8 hours.

-Howard

On May 27, 2010, at 12:59 AM, Jeffrey Yasskin wrote:

> A couple of us have been working, sporadically, on the matching LLVM
> intrinsics: http://docs.google.com/Doc?docid=0AYWBeVVqyP7dZGRiNG1oeHpfMjJkejVnOThkZA&hl=en.
> We suspect, although we're not certain, that we can get away with just
> atomic load, store, add, exchange, compare_exchange, and fence, and
> have the backend match certain cmpxchg-loops and lower them to the
> appropriate atomic sequence when that's available. We really only need
> add and exchange because we expect them to be more common than the
> other operations, so they may benefit from a smaller encoding.
> 
> We aren't modeling the difference between cmpxchg_strong and
> cmpxchg_weak on the assumption that the backend can figure out whether
> the code can tell the difference. (I haven't thought hard about
> whether that's true.) We only have a single order argument to cmpxchg
> and we've omitted memory_order_consume to keep things simple in the
> first version.
> 
> We haven't converted the proposal to use instructions yet (we started
> out with intrinsics), so the format below is just rough, but the basic
> schema is:
> 
> %old = i32 atomic_exchange i32* %ptr, i32 %new, order,
> single_thread|cross_thread [, volatile]
> 
> The instructions can take any type pointer, up to a target-dependent
> maximum size. Clang would be responsible for turning its intrinsics
> into library calls when their arguments get too big. Note that the
> maximum size becomes part of the platform ABI because you can't mix
> locked calls and atomic instructions on the same address.
> 
> The single_thread|cross_thread argument lets us merge
> atomic_signal_fence() with atomic_thread_fence(). It may also be
> useful for things like atomic-increments that only need to be atomic
> to communicate with a signal handler—such an increment could compile
> to inc instead of lock;xadd.
> 
> So ... here's what I'd propose for clang's builtins:
> 
> __atomic_foo(T*, args, memory_order, bool cross_thread)
> 
> foo is: load, store, add, exchange, compare_exchange_weak, fence,
> test_and_set, clear
> args and the return type depend on foo.
> 
> I'm including test_and_set and clear because the right lowering to
> LLVM IR might depend on the target. (If I remember correctly, on
> PA-RISC test_and_set is an exchange(0), while everywhere else it's an
> exchange(1).)
> 
> The volatile bit on the instruction would be set by whether T is volatile.
> 
> When the order and cross_thread arguments aren't obviously constant,
> Clang would emit a switch to pick between the appropriate
> instructions. Alternately, Clang could specify that they have to be
> constant, and the library would emit the switch. Alternately, if you
> really don't want the switch, we could change the proposed LLVM
> intrinsics to take variable arguments for those and have the backend
> emit the conditionals if necessary.
> 
> Jeffrey
> 
> On Wed, May 26, 2010 at 5:32 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>> I'm beginning to survey Chapter 29, <atomic> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) for what is actually required/desired in the way of compiler intrinsics regarding atomic synchronization.  It appears to be a superset of the gcc __sync_* intrinsics (http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Atomic-Builtins.html#Atomic-Builtins).  I would like to start a conversation on exactly what intrinsics we would like to support in clang.  Maybe we want the full set specified by <atomic>.  Or maybe we want a subset and have the rest build on this subset.  At this point I don't know, and I'm looking for people with more expertise in this area than I have.
>> 
>> There are approximately 14 operations, crossed with various memory ordering constraints specified by <atomic> and I've summarized them below:
>> 
>> void store(volatile type* x, type source) : memory_order_relaxed,
>>                                           memory_order_release,
>>                                           memory_order_seq_cst
>> 
>> type load(volatile type* x) : memory_order_relaxed,
>>                             memory_order_consume,
>>                             memory_order_acquire,
>>                             memory_order_seq_cst
>> 
>> type exchange(volatile type* x, type source) : memory_order_relaxed,
>>                                              memory_order_consume,
>>                                              memory_order_acquire,
>>                                              memory_order_release,
>>                                              memory_order_acq_rel,
>>                                              memory_order_seq_cst
>> 
>> bool compare_exchange_strong(volatile type* x, type* expected, type desired,
>>                            success_order, failure_order)
>>                            : memory_order_relaxed, ...
>>                              memory_order_consume, ...
>>                              memory_order_acquire, ...
>>                              memory_order_release, ...
>>                              memory_order_acq_rel, ...
>>                              memory_order_seq_cst, ...
>> 
>> bool compare_exchange_weak(volatile type* x, type* expected, type desired,
>>                          success_order, failure_order)
>>                            : memory_order_relaxed, ...
>>                              memory_order_consume, ...
>>                              memory_order_acquire, ...
>>                              memory_order_release, ...
>>                              memory_order_acq_rel, ...
>>                              memory_order_seq_cst, ...
>> 
>> type fetch_add(volatile* x, type y): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> type fetch_sub(volatile* x, type y): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> type fetch_or(volatile* x, type y): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> type fetch_xor(volatile* x, type y): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> type fetch_and(volatile* x, type y): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> bool test_and_set(volatile flag* x): memory_order_relaxed,
>>                                    memory_order_consume,
>>                                    memory_order_acquire,
>>                                    memory_order_release,
>>                                    memory_order_acq_rel,
>>                                    memory_order_seq_cst
>> 
>> void clear(volatile flag* x): memory_order_relaxed,
>>                             memory_order_consume,
>>                             memory_order_acquire,
>>                             memory_order_release,
>>                             memory_order_acq_rel,
>>                             memory_order_seq_cst
>> 
>> void fence() : memory_order_relaxed,
>>              memory_order_consume,
>>              memory_order_acquire,
>>              memory_order_release,
>>              memory_order_acq_rel,
>>              memory_order_seq_cst
>> 
>> void signal_fence() : memory_order_relaxed,
>>                     memory_order_consume,
>>                     memory_order_acquire,
>>                     memory_order_release,
>>                     memory_order_acq_rel,
>>                     memory_order_seq_cst
>> 
>> -Howard
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>