[cfe-dev] atomic intrinsics

Thu May 27 08:14:39 PDT 2010

I've been talking to Lawrence Crowl about the atomics along the way,
so I'm pretty happy with the state of the standard. Even if LLVM
requires cmpxchg-loops for most of these operations, I think the extra
operations on the atomics provide a better user-facing programming
model. The idea to allow signal-only operations on things other than
fences is something of an experiment, so I'm not sure it belongs in
the standard yet.

On constants, I believe their thinking was that users will generally
pass a constant in voluntarily, and that compilers are pretty good at
propagating constants these days, so the fact that it's strictly a
variable shouldn't hurt things. The tricky bit for clang seems to be
that LLVM is good at propagating constants, but clang isn't, so we may
have to generate more IR than if the memory_order were passed in, say,
a template argument. To me, that's not a good enough argument to
change the user-facing syntax.

On Thu, May 27, 2010 at 6:36 AM, Howard Hinnant <hhinnant at apple.com> wrote:
> Thanks much for the very detailed answer Jeffrey!
>
> Are there any changes to the C++0X working draft:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf
>
> that you believe need to be made in this area in order to significantly increase the quality of the llvm/clang implementation?  Personally I'm wondering about ways to ensure that the memory order is a compile-time constant.  This question has a tight deadline.  I'm turning in national body comments on the C++0X FCD in about 8 hours.
>
> -Howard
>
> On May 27, 2010, at 12:59 AM, Jeffrey Yasskin wrote:
>
>> A couple of us have been working, sporadically, on the matching LLVM
>> intrinsics: http://docs.google.com/Doc?docid=0AYWBeVVqyP7dZGRiNG1oeHpfMjJkejVnOThkZA&hl=en.
>> We suspect, although we're not certain, that we can get away with just
>> atomic load, store, add, exchange, compare_exchange, and fence, and
>> have the backend match certain cmpxchg-loops and lower them to the
>> appropriate atomic sequence when that's available. We really only need
>> add and exchange because we expect them to be more common than the
>> other operations, so they may benefit from a smaller encoding.
>>
>> We aren't modeling the difference between cmpxchg_strong and
>> cmpxchg_weak on the assumption that the backend can figure out whether
>> the code can tell the difference. (I haven't thought hard about
>> whether that's true.) We only have a single order argument to cmpxchg
>> and we've omitted memory_order_consume to keep things simple in the
>> first version.
>>
>> We haven't converted the proposal to use instructions yet (we started
>> out with intrinsics), so the format below is just rough, but the basic
>> schema is:
>>
>> %old = i32 atomic_exchange i32* %ptr, i32 %new, order,
>> single_thread|cross_thread [, volatile]
>>
>> The instructions can take any type pointer, up to a target-dependent
>> maximum size. Clang would be responsible for turning its intrinsics
>> into library calls when their arguments get too big. Note that the
>> maximum size becomes part of the platform ABI because you can't mix
>> locked calls and atomic instructions on the same address.
>>
>> The single_thread|cross_thread argument lets us merge
>> atomic_signal_fence() with atomic_thread_fence(). It may also be
>> useful for things like atomic-increments that only need to be atomic
>> to communicate with a signal handler—such an increment could compile
>> to inc instead of lock;xadd.
>>
>> So ... here's what I'd propose for clang's builtins:
>>
>> __atomic_foo(T*, args, memory_order, bool cross_thread)
>>
>> foo is: load, store, add, exchange, compare_exchange_weak, fence,
>> test_and_set, clear
>> args and the return type depend on foo.
>>
>> I'm including test_and_set and clear because the right lowering to
>> LLVM IR might depend on the target. (If I remember correctly, on
>> PA-RISC test_and_set is an exchange(0), while everywhere else it's an
>> exchange(1).)
>>
>> The volatile bit on the instruction would be set by whether T is volatile.
>>
>> When the order and cross_thread arguments aren't obviously constant,
>> Clang would emit a switch to pick between the appropriate
>> instructions. Alternately, Clang could specify that they have to be
>> constant, and the library would emit the switch. Alternately, if you
>> really don't want the switch, we could change the proposed LLVM
>> intrinsics to take variable arguments for those and have the backend
>> emit the conditionals if necessary.
>>
>> Jeffrey
>>
>> On Wed, May 26, 2010 at 5:32 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>>> I'm beginning to survey Chapter 29, <atomic> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) for what is actually required/desired in the way of compiler intrinsics regarding atomic synchronization.  It appears to be a superset of the gcc __sync_* intrinsics (http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Atomic-Builtins.html#Atomic-Builtins).  I would like to start a conversation on exactly what intrinsics we would like to support in clang.  Maybe we want the full set specified by <atomic>.  Or maybe we want a subset and have the rest build on this subset.  At this point I don't know, and I'm looking for people with more expertise in this area than I have.
>>>
>>> There are approximately 14 operations, crossed with various memory ordering constraints specified by <atomic> and I've summarized them below:
>>>
>>> void store(volatile type* x, type source) : memory_order_relaxed,
>>>                                           memory_order_release,
>>>                                           memory_order_seq_cst
>>>
>>> type load(volatile type* x) : memory_order_relaxed,
>>>                             memory_order_consume,
>>>                             memory_order_acquire,
>>>                             memory_order_seq_cst
>>>
>>> type exchange(volatile type* x, type source) : memory_order_relaxed,
>>>                                              memory_order_consume,
>>>                                              memory_order_acquire,
>>>                                              memory_order_release,
>>>                                              memory_order_acq_rel,
>>>                                              memory_order_seq_cst
>>>
>>> bool compare_exchange_strong(volatile type* x, type* expected, type desired,
>>>                            success_order, failure_order)
>>>                            : memory_order_relaxed, ...
>>>                              memory_order_consume, ...
>>>                              memory_order_acquire, ...
>>>                              memory_order_release, ...
>>>                              memory_order_acq_rel, ...
>>>                              memory_order_seq_cst, ...
>>>
>>> bool compare_exchange_weak(volatile type* x, type* expected, type desired,
>>>                          success_order, failure_order)
>>>                            : memory_order_relaxed, ...
>>>                              memory_order_consume, ...
>>>                              memory_order_acquire, ...
>>>                              memory_order_release, ...
>>>                              memory_order_acq_rel, ...
>>>                              memory_order_seq_cst, ...
>>>
>>> type fetch_add(volatile* x, type y): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> type fetch_sub(volatile* x, type y): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> type fetch_or(volatile* x, type y): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> type fetch_xor(volatile* x, type y): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> type fetch_and(volatile* x, type y): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> bool test_and_set(volatile flag* x): memory_order_relaxed,
>>>                                    memory_order_consume,
>>>                                    memory_order_acquire,
>>>                                    memory_order_release,
>>>                                    memory_order_acq_rel,
>>>                                    memory_order_seq_cst
>>>
>>> void clear(volatile flag* x): memory_order_relaxed,
>>>                             memory_order_consume,
>>>                             memory_order_acquire,
>>>                             memory_order_release,
>>>                             memory_order_acq_rel,
>>>                             memory_order_seq_cst
>>>
>>> void fence() : memory_order_relaxed,
>>>              memory_order_consume,
>>>              memory_order_acquire,
>>>              memory_order_release,
>>>              memory_order_acq_rel,
>>>              memory_order_seq_cst
>>>
>>> void signal_fence() : memory_order_relaxed,
>>>                     memory_order_consume,
>>>                     memory_order_acquire,
>>>                     memory_order_release,
>>>                     memory_order_acq_rel,
>>>                     memory_order_seq_cst
>>>
>>> -Howard
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>
>