[cfe-commits] [PATCH] atomic operation builtins, part 1

Wed Oct 12 18:27:18 PDT 2011

On Wed, Oct 12, 2011 at 6:06 PM, John McCall <rjmccall at apple.com> wrote:
> On Oct 12, 2011, at 2:54 PM, Andrew MacLeod wrote:
>> On 10/12/2011 05:04 PM, John McCall wrote:
>>>> The compiler built-ins can always be called directly with arbitrary sized objects, so it would be good for the generic routine to handle it rather than artificially restrict it.
>>> Are you saying that the generic routine promises co-operation with lock-free atomics when the size parameter is sufficiently small?  That seems unfortunate.  The generic routine is certainly going to be slower than lock-free atomics, but that doesn't mean its performance is unimportant;  a good implementation using striped spin-locks would probably end up on the order of only 2-4 times slower than the lock-free code, so adding a bunch of pre-checks may be quite significant.
>>>
>> The rule is that on a given target, a given object size is either A) lockfree or B) unknown.
>>
>> so if the compiler always generates lock free instructions for say 8 byte values, the library will have to as well.  ie,  __atomic_load_8() will also be implemented as lock free. it has to be or things break.        The interface from the compiler point of view is that you simply call the routine __atomic_load(obj), the presence of __atomic_load_8 is transparent to the caller.  Normally the compiler will turn __atomic_load() into the direct call,  but I don't see any reason to force the compiler do that. It should be free to leave the original call to __atomic_load () and that code should co-exist with stuff that did call __atomic_load_8.   its free to do whatever it wants with sizes that don't exactly match the 5 'optimal' sizes.
>>
>> It does mean that the library entry point for atomic_load() may start with a switch on the size of the object to  jump to the "efficient" routines if size matches, but thats about it.  Is that what you are concerned about?
>
> Yes.  The library entry point for atomic_load is an implementation detail of the ABI that we have total control of, so why make stronger promises than necessary?  Why not force the compiler to directly call (or inline, if it knows it's lock-free) __atomic_load_8?
>
>>>>> Honestly, I don't think future-proofing against arbitrary new atomic instructions really makes any sense.  Even going up to 16 bytes (on architectures where that can't be done lock-free now) worries me a bit.
>>>>>
>>>>> Rounding up also worries me, since the user has no control over the padding bytes, but they can still cause spurious failures on, say, compare-and-swap.
>>>> If the padding is under control of the 'atomic' keyword for the type, then we have complete control over those padding bytes. Regardless of what junk might be in them, they are part of the atomic data structure and the only way to access them is through a full atomic access. Its like adding another user field to the structure and not setting it. It shouldn't cause spurious failures.
>>> Well, we at least have to make sure that our atomic operations always zero-pad their operands.  For example, if we do an atomic store into a 5-byte struct that we've padded to 8 bytes, we have to make sure we store a zero pattern into the pad.  That's feasible, but it's complexity that we should at least acknowledge before committing to it.
>>>
>>> I can also see this exposing lots of what are, admittedly, source bugs, like only zero'ing the first sizeof(T) bytes of an _Atomic(T).
>>
>> I'm still not sure I see how it matters. Nothing should ever access or change that padding field in an unfriendly way since it is always part of the atomic word that is load/stored, exchanged or compare_exchanged. Those are the only 4 ways to access the memory.  All the fetch_op's only operate on full integral values, so thats not a concern.   Compare_exchange is the only one which it could be affected, and it requires that the 'expected' value be from an atomic load or exchange…
>
> I wasn't aware of that.  That's really a very strange constraint — why is it illegal to independently assemble a value that I expect might be in an atomic variable?

The standard doesn't make any guarantees about the layout of
std::atomic<T>; how exactly could you manipulate the bits of an
std::atomic<T> in a cross-platform manner?

-Eli