[cfe-dev] atomic intrinsics

Wed May 26 21:59:32 PDT 2010

A couple of us have been working, sporadically, on the matching LLVM
intrinsics: http://docs.google.com/Doc?docid=0AYWBeVVqyP7dZGRiNG1oeHpfMjJkejVnOThkZA&hl=en.
We suspect, although we're not certain, that we can get away with just
atomic load, store, add, exchange, compare_exchange, and fence, and
have the backend match certain cmpxchg-loops and lower them to the
appropriate atomic sequence when that's available. We really only need
add and exchange because we expect them to be more common than the
other operations, so they may benefit from a smaller encoding.

We aren't modeling the difference between cmpxchg_strong and
cmpxchg_weak on the assumption that the backend can figure out whether
the code can tell the difference. (I haven't thought hard about
whether that's true.) We only have a single order argument to cmpxchg
and we've omitted memory_order_consume to keep things simple in the
first version.

We haven't converted the proposal to use instructions yet (we started
out with intrinsics), so the format below is just rough, but the basic
schema is:

%old = i32 atomic_exchange i32* %ptr, i32 %new, order,
single_thread|cross_thread [, volatile]

The instructions can take any type pointer, up to a target-dependent
maximum size. Clang would be responsible for turning its intrinsics
into library calls when their arguments get too big. Note that the
maximum size becomes part of the platform ABI because you can't mix
locked calls and atomic instructions on the same address.

The single_thread|cross_thread argument lets us merge
atomic_signal_fence() with atomic_thread_fence(). It may also be
useful for things like atomic-increments that only need to be atomic
to communicate with a signal handler—such an increment could compile
to inc instead of lock;xadd.

So ... here's what I'd propose for clang's builtins:

__atomic_foo(T*, args, memory_order, bool cross_thread)

foo is: load, store, add, exchange, compare_exchange_weak, fence,
test_and_set, clear
args and the return type depend on foo.

I'm including test_and_set and clear because the right lowering to
LLVM IR might depend on the target. (If I remember correctly, on
PA-RISC test_and_set is an exchange(0), while everywhere else it's an
exchange(1).)

The volatile bit on the instruction would be set by whether T is volatile.

When the order and cross_thread arguments aren't obviously constant,
Clang would emit a switch to pick between the appropriate
instructions. Alternately, Clang could specify that they have to be
constant, and the library would emit the switch. Alternately, if you
really don't want the switch, we could change the proposed LLVM
intrinsics to take variable arguments for those and have the backend
emit the conditionals if necessary.

Jeffrey

On Wed, May 26, 2010 at 5:32 PM, Howard Hinnant <hhinnant at apple.com> wrote:
> I'm beginning to survey Chapter 29, <atomic> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) for what is actually required/desired in the way of compiler intrinsics regarding atomic synchronization.  It appears to be a superset of the gcc __sync_* intrinsics (http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Atomic-Builtins.html#Atomic-Builtins).  I would like to start a conversation on exactly what intrinsics we would like to support in clang.  Maybe we want the full set specified by <atomic>.  Or maybe we want a subset and have the rest build on this subset.  At this point I don't know, and I'm looking for people with more expertise in this area than I have.
>
> There are approximately 14 operations, crossed with various memory ordering constraints specified by <atomic> and I've summarized them below:
>
> void store(volatile type* x, type source) : memory_order_relaxed,
>                                           memory_order_release,
>                                           memory_order_seq_cst
>
> type load(volatile type* x) : memory_order_relaxed,
>                             memory_order_consume,
>                             memory_order_acquire,
>                             memory_order_seq_cst
>
> type exchange(volatile type* x, type source) : memory_order_relaxed,
>                                              memory_order_consume,
>                                              memory_order_acquire,
>                                              memory_order_release,
>                                              memory_order_acq_rel,
>                                              memory_order_seq_cst
>
> bool compare_exchange_strong(volatile type* x, type* expected, type desired,
>                            success_order, failure_order)
>                            : memory_order_relaxed, ...
>                              memory_order_consume, ...
>                              memory_order_acquire, ...
>                              memory_order_release, ...
>                              memory_order_acq_rel, ...
>                              memory_order_seq_cst, ...
>
> bool compare_exchange_weak(volatile type* x, type* expected, type desired,
>                          success_order, failure_order)
>                            : memory_order_relaxed, ...
>                              memory_order_consume, ...
>                              memory_order_acquire, ...
>                              memory_order_release, ...
>                              memory_order_acq_rel, ...
>                              memory_order_seq_cst, ...
>
> type fetch_add(volatile* x, type y): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> type fetch_sub(volatile* x, type y): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> type fetch_or(volatile* x, type y): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> type fetch_xor(volatile* x, type y): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> type fetch_and(volatile* x, type y): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> bool test_and_set(volatile flag* x): memory_order_relaxed,
>                                    memory_order_consume,
>                                    memory_order_acquire,
>                                    memory_order_release,
>                                    memory_order_acq_rel,
>                                    memory_order_seq_cst
>
> void clear(volatile flag* x): memory_order_relaxed,
>                             memory_order_consume,
>                             memory_order_acquire,
>                             memory_order_release,
>                             memory_order_acq_rel,
>                             memory_order_seq_cst
>
> void fence() : memory_order_relaxed,
>              memory_order_consume,
>              memory_order_acquire,
>              memory_order_release,
>              memory_order_acq_rel,
>              memory_order_seq_cst
>
> void signal_fence() : memory_order_relaxed,
>                     memory_order_consume,
>                     memory_order_acquire,
>                     memory_order_release,
>                     memory_order_acq_rel,
>                     memory_order_seq_cst
>
> -Howard
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>