[cfe-dev] atomic intrinsics

Thu May 27 08:17:32 PDT 2010

Ok, thanks.

-Howard

On May 27, 2010, at 11:14 AM, Jeffrey Yasskin wrote:

> I've been talking to Lawrence Crowl about the atomics along the way,
> so I'm pretty happy with the state of the standard. Even if LLVM
> requires cmpxchg-loops for most of these operations, I think the extra
> operations on the atomics provide a better user-facing programming
> model. The idea to allow signal-only operations on things other than
> fences is something of an experiment, so I'm not sure it belongs in
> the standard yet.
> 
> On constants, I believe their thinking was that users will generally
> pass a constant in voluntarily, and that compilers are pretty good at
> propagating constants these days, so the fact that it's strictly a
> variable shouldn't hurt things. The tricky bit for clang seems to be
> that LLVM is good at propagating constants, but clang isn't, so we may
> have to generate more IR than if the memory_order were passed in, say,
> a template argument. To me, that's not a good enough argument to
> change the user-facing syntax.
> 
> On Thu, May 27, 2010 at 6:36 AM, Howard Hinnant <hhinnant at apple.com> wrote:
>> Thanks much for the very detailed answer Jeffrey!
>> 
>> Are there any changes to the C++0X working draft:
>> 
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf
>> 
>> that you believe need to be made in this area in order to significantly increase the quality of the llvm/clang implementation?  Personally I'm wondering about ways to ensure that the memory order is a compile-time constant.  This question has a tight deadline.  I'm turning in national body comments on the C++0X FCD in about 8 hours.
>> 
>> -Howard
>> 
>> On May 27, 2010, at 12:59 AM, Jeffrey Yasskin wrote:
>> 
>>> A couple of us have been working, sporadically, on the matching LLVM
>>> intrinsics: http://docs.google.com/Doc?docid=0AYWBeVVqyP7dZGRiNG1oeHpfMjJkejVnOThkZA&hl=en.
>>> We suspect, although we're not certain, that we can get away with just
>>> atomic load, store, add, exchange, compare_exchange, and fence, and
>>> have the backend match certain cmpxchg-loops and lower them to the
>>> appropriate atomic sequence when that's available. We really only need
>>> add and exchange because we expect them to be more common than the
>>> other operations, so they may benefit from a smaller encoding.
>>> 
>>> We aren't modeling the difference between cmpxchg_strong and
>>> cmpxchg_weak on the assumption that the backend can figure out whether
>>> the code can tell the difference. (I haven't thought hard about
>>> whether that's true.) We only have a single order argument to cmpxchg
>>> and we've omitted memory_order_consume to keep things simple in the
>>> first version.
>>> 
>>> We haven't converted the proposal to use instructions yet (we started
>>> out with intrinsics), so the format below is just rough, but the basic
>>> schema is:
>>> 
>>> %old = i32 atomic_exchange i32* %ptr, i32 %new, order,
>>> single_thread|cross_thread [, volatile]
>>> 
>>> The instructions can take any type pointer, up to a target-dependent
>>> maximum size. Clang would be responsible for turning its intrinsics
>>> into library calls when their arguments get too big. Note that the
>>> maximum size becomes part of the platform ABI because you can't mix
>>> locked calls and atomic instructions on the same address.
>>> 
>>> The single_thread|cross_thread argument lets us merge
>>> atomic_signal_fence() with atomic_thread_fence(). It may also be
>>> useful for things like atomic-increments that only need to be atomic
>>> to communicate with a signal handler—such an increment could compile
>>> to inc instead of lock;xadd.
>>> 
>>> So ... here's what I'd propose for clang's builtins:
>>> 
>>> __atomic_foo(T*, args, memory_order, bool cross_thread)
>>> 
>>> foo is: load, store, add, exchange, compare_exchange_weak, fence,
>>> test_and_set, clear
>>> args and the return type depend on foo.
>>> 
>>> I'm including test_and_set and clear because the right lowering to
>>> LLVM IR might depend on the target. (If I remember correctly, on
>>> PA-RISC test_and_set is an exchange(0), while everywhere else it's an
>>> exchange(1).)
>>> 
>>> The volatile bit on the instruction would be set by whether T is volatile.
>>> 
>>> When the order and cross_thread arguments aren't obviously constant,
>>> Clang would emit a switch to pick between the appropriate
>>> instructions. Alternately, Clang could specify that they have to be
>>> constant, and the library would emit the switch. Alternately, if you
>>> really don't want the switch, we could change the proposed LLVM
>>> intrinsics to take variable arguments for those and have the backend
>>> emit the conditionals if necessary.
>>> 
>>> Jeffrey
>>> 
>>> On Wed, May 26, 2010 at 5:32 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>>>> I'm beginning to survey Chapter 29, <atomic> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) for what is actually required/desired in the way of compiler intrinsics regarding atomic synchronization.  It appears to be a superset of the gcc __sync_* intrinsics (http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Atomic-Builtins.html#Atomic-Builtins).  I would like to start a conversation on exactly what intrinsics we would like to support in clang.  Maybe we want the full set specified by <atomic>.  Or maybe we want a subset and have the rest build on this subset.  At this point I don't know, and I'm looking for people with more expertise in this area than I have.
>>>> 
>>>> There are approximately 14 operations, crossed with various memory ordering constraints specified by <atomic> and I've summarized them below:
>>>> 
>>>> void store(volatile type* x, type source) : memory_order_relaxed,
>>>>                                           memory_order_release,
>>>>                                           memory_order_seq_cst
>>>> 
>>>> type load(volatile type* x) : memory_order_relaxed,
>>>>                             memory_order_consume,
>>>>                             memory_order_acquire,
>>>>                             memory_order_seq_cst
>>>> 
>>>> type exchange(volatile type* x, type source) : memory_order_relaxed,
>>>>                                              memory_order_consume,
>>>>                                              memory_order_acquire,
>>>>                                              memory_order_release,
>>>>                                              memory_order_acq_rel,
>>>>                                              memory_order_seq_cst
>>>> 
>>>> bool compare_exchange_strong(volatile type* x, type* expected, type desired,
>>>>                            success_order, failure_order)
>>>>                            : memory_order_relaxed, ...
>>>>                              memory_order_consume, ...
>>>>                              memory_order_acquire, ...
>>>>                              memory_order_release, ...
>>>>                              memory_order_acq_rel, ...
>>>>                              memory_order_seq_cst, ...
>>>> 
>>>> bool compare_exchange_weak(volatile type* x, type* expected, type desired,
>>>>                          success_order, failure_order)
>>>>                            : memory_order_relaxed, ...
>>>>                              memory_order_consume, ...
>>>>                              memory_order_acquire, ...
>>>>                              memory_order_release, ...
>>>>                              memory_order_acq_rel, ...
>>>>                              memory_order_seq_cst, ...
>>>> 
>>>> type fetch_add(volatile* x, type y): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> type fetch_sub(volatile* x, type y): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> type fetch_or(volatile* x, type y): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> type fetch_xor(volatile* x, type y): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> type fetch_and(volatile* x, type y): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> bool test_and_set(volatile flag* x): memory_order_relaxed,
>>>>                                    memory_order_consume,
>>>>                                    memory_order_acquire,
>>>>                                    memory_order_release,
>>>>                                    memory_order_acq_rel,
>>>>                                    memory_order_seq_cst
>>>> 
>>>> void clear(volatile flag* x): memory_order_relaxed,
>>>>                             memory_order_consume,
>>>>                             memory_order_acquire,
>>>>                             memory_order_release,
>>>>                             memory_order_acq_rel,
>>>>                             memory_order_seq_cst
>>>> 
>>>> void fence() : memory_order_relaxed,
>>>>              memory_order_consume,
>>>>              memory_order_acquire,
>>>>              memory_order_release,
>>>>              memory_order_acq_rel,
>>>>              memory_order_seq_cst
>>>> 
>>>> void signal_fence() : memory_order_relaxed,
>>>>                     memory_order_consume,
>>>>                     memory_order_acquire,
>>>>                     memory_order_release,
>>>>                     memory_order_acq_rel,
>>>>                     memory_order_seq_cst
>>>> 
>>>> -Howard
>>>> 
>>>> 
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>> 
>> 
>>