[cfe-dev] atomic intrinsics
Howard Hinnant
hhinnant at apple.com
Thu May 27 08:17:32 PDT 2010
Ok, thanks.
-Howard
On May 27, 2010, at 11:14 AM, Jeffrey Yasskin wrote:
> I've been talking to Lawrence Crowl about the atomics along the way,
> so I'm pretty happy with the state of the standard. Even if LLVM
> requires cmpxchg-loops for most of these operations, I think the extra
> operations on the atomics provide a better user-facing programming
> model. The idea to allow signal-only operations on things other than
> fences is something of an experiment, so I'm not sure it belongs in
> the standard yet.
>
> On constants, I believe their thinking was that users will generally
> pass a constant in voluntarily, and that compilers are pretty good at
> propagating constants these days, so the fact that it's strictly a
> variable shouldn't hurt things. The tricky bit for clang seems to be
> that LLVM is good at propagating constants, but clang isn't, so we may
> have to generate more IR than if the memory_order were passed in, say,
> a template argument. To me, that's not a good enough argument to
> change the user-facing syntax.
>
> On Thu, May 27, 2010 at 6:36 AM, Howard Hinnant <hhinnant at apple.com> wrote:
>> Thanks much for the very detailed answer Jeffrey!
>>
>> Are there any changes to the C++0X working draft:
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf
>>
>> that you believe need to be made in this area in order to significantly increase the quality of the llvm/clang implementation? Personally I'm wondering about ways to ensure that the memory order is a compile-time constant. This question has a tight deadline. I'm turning in national body comments on the C++0X FCD in about 8 hours.
>>
>> -Howard
>>
>> On May 27, 2010, at 12:59 AM, Jeffrey Yasskin wrote:
>>
>>> A couple of us have been working, sporadically, on the matching LLVM
>>> intrinsics: http://docs.google.com/Doc?docid=0AYWBeVVqyP7dZGRiNG1oeHpfMjJkejVnOThkZA&hl=en.
>>> We suspect, although we're not certain, that we can get away with just
>>> atomic load, store, add, exchange, compare_exchange, and fence, and
>>> have the backend match certain cmpxchg-loops and lower them to the
>>> appropriate atomic sequence when that's available. We really only need
>>> add and exchange because we expect them to be more common than the
>>> other operations, so they may benefit from a smaller encoding.
>>>
>>> We aren't modeling the difference between cmpxchg_strong and
>>> cmpxchg_weak on the assumption that the backend can figure out whether
>>> the code can tell the difference. (I haven't thought hard about
>>> whether that's true.) We only have a single order argument to cmpxchg
>>> and we've omitted memory_order_consume to keep things simple in the
>>> first version.
>>>
>>> We haven't converted the proposal to use instructions yet (we started
>>> out with intrinsics), so the format below is just rough, but the basic
>>> schema is:
>>>
>>> %old = i32 atomic_exchange i32* %ptr, i32 %new, order,
>>> single_thread|cross_thread [, volatile]
>>>
>>> The instructions can take any type pointer, up to a target-dependent
>>> maximum size. Clang would be responsible for turning its intrinsics
>>> into library calls when their arguments get too big. Note that the
>>> maximum size becomes part of the platform ABI because you can't mix
>>> locked calls and atomic instructions on the same address.
>>>
>>> The single_thread|cross_thread argument lets us merge
>>> atomic_signal_fence() with atomic_thread_fence(). It may also be
>>> useful for things like atomic-increments that only need to be atomic
>>> to communicate with a signal handler—such an increment could compile
>>> to inc instead of lock;xadd.
>>>
>>> So ... here's what I'd propose for clang's builtins:
>>>
>>> __atomic_foo(T*, args, memory_order, bool cross_thread)
>>>
>>> foo is: load, store, add, exchange, compare_exchange_weak, fence,
>>> test_and_set, clear
>>> args and the return type depend on foo.
>>>
>>> I'm including test_and_set and clear because the right lowering to
>>> LLVM IR might depend on the target. (If I remember correctly, on
>>> PA-RISC test_and_set is an exchange(0), while everywhere else it's an
>>> exchange(1).)
>>>
>>> The volatile bit on the instruction would be set by whether T is volatile.
>>>
>>> When the order and cross_thread arguments aren't obviously constant,
>>> Clang would emit a switch to pick between the appropriate
>>> instructions. Alternately, Clang could specify that they have to be
>>> constant, and the library would emit the switch. Alternately, if you
>>> really don't want the switch, we could change the proposed LLVM
>>> intrinsics to take variable arguments for those and have the backend
>>> emit the conditionals if necessary.
>>>
>>> Jeffrey
>>>
>>> On Wed, May 26, 2010 at 5:32 PM, Howard Hinnant <hhinnant at apple.com> wrote:
>>>> I'm beginning to survey Chapter 29, <atomic> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf) for what is actually required/desired in the way of compiler intrinsics regarding atomic synchronization. It appears to be a superset of the gcc __sync_* intrinsics (http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Atomic-Builtins.html#Atomic-Builtins). I would like to start a conversation on exactly what intrinsics we would like to support in clang. Maybe we want the full set specified by <atomic>. Or maybe we want a subset and have the rest build on this subset. At this point I don't know, and I'm looking for people with more expertise in this area than I have.
>>>>
>>>> There are approximately 14 operations, crossed with various memory ordering constraints specified by <atomic> and I've summarized them below:
>>>>
>>>> void store(volatile type* x, type source) : memory_order_relaxed,
>>>> memory_order_release,
>>>> memory_order_seq_cst
>>>>
>>>> type load(volatile type* x) : memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_seq_cst
>>>>
>>>> type exchange(volatile type* x, type source) : memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> bool compare_exchange_strong(volatile type* x, type* expected, type desired,
>>>> success_order, failure_order)
>>>> : memory_order_relaxed, ...
>>>> memory_order_consume, ...
>>>> memory_order_acquire, ...
>>>> memory_order_release, ...
>>>> memory_order_acq_rel, ...
>>>> memory_order_seq_cst, ...
>>>>
>>>> bool compare_exchange_weak(volatile type* x, type* expected, type desired,
>>>> success_order, failure_order)
>>>> : memory_order_relaxed, ...
>>>> memory_order_consume, ...
>>>> memory_order_acquire, ...
>>>> memory_order_release, ...
>>>> memory_order_acq_rel, ...
>>>> memory_order_seq_cst, ...
>>>>
>>>> type fetch_add(volatile* x, type y): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> type fetch_sub(volatile* x, type y): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> type fetch_or(volatile* x, type y): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> type fetch_xor(volatile* x, type y): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> type fetch_and(volatile* x, type y): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> bool test_and_set(volatile flag* x): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> void clear(volatile flag* x): memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> void fence() : memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> void signal_fence() : memory_order_relaxed,
>>>> memory_order_consume,
>>>> memory_order_acquire,
>>>> memory_order_release,
>>>> memory_order_acq_rel,
>>>> memory_order_seq_cst
>>>>
>>>> -Howard
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>
>>
More information about the cfe-dev
mailing list