[cfe-commits] [PATCH] atomic operation builtins, part 1

Wed Oct 12 16:07:30 PDT 2011

On 10/12/11, Jeffrey Yasskin <jyasskin at google.com> wrote:
> [+ Lawrence who's been driving the ABI-compatibility design. Context
> at
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047614.html]
>
> On Oct 12, 2011 John McCall <rjmccall at apple.com> wrote:
> > On Oct 12, 2011, at 9:03 AM, Jeffrey Yasskin wrote:
> > > On Oct 12, 2011 Andrew MacLeod <amacleod at redhat.com> wrote:
> > > > - language atomic types up to 16 bytes should be padded to
> > > > an appropriate size, and aligned properly.
> > > > - if memory matching one of the 5 'optimized' sizes isn't
> > > > aligned properly, results are undefined.
> > > > - if the size does not match one of the 5 specific routines,
> > > > then the library generic ABI can handle it.  There's no
> > > > alignment guarantees, so I presume it would end up being
> > > > a locked implementation using hash tables and addresses
> > > > or something.
> > >
> > > The ABI library needs to demand alignment guarantees, or
> > > have them passed in, or it won't be able to support larger
> > > lock-free sizes on new architectures.
> >
> > How aggressive are you suggesting we be about this?  If I make
> > this type atomic:
> >
> > struct { float values[5]; };
> >
> > do we really increase its size and alignment up to 32 bytes in
> > the wild hope that the architecture will add 32-byte atomics
> > someday?  If so, what's the limit?  If not, why is 16 the limit?

The 16-byte limit is there know because solving the ABA problem often
requires a compare-and-swap on a pair of pointer and something else
of that size.  Switching to a 16-byte pointer would likely imply
that hardware implements 32-byte compare-and-swap.  That pointer
change is already an ABI change, so we don't have to worry about
breaking the ABI for that case.

> The goal was that architectures could add new atomic
> instructions without forcing an ABI change. Changing the size of
> atomic<FiveFloats> would be an ABI change, so we should try to
> plan ahead to avoid it.  All the existing atomics have required
> alignments equal to their sizes, and whole-cacheline cmpxchg
> seems like a plausible future instruction and would also require
> alignment equal to the size, so that's what I've been suggesting.

To permit such implementations, both the C and C++ standards are
careful to state that the alignment and size of an atomic type need
not be the same as that of the base type.

> I suspect users won't use really large types with atomic<T> simply
> because every access requires a copy of the whole object. And
> when they switch to explicitly locked data, that'll avoid wasted
> space from the extra alignment on atomic types.

Pragmatically, I suspect that such atomics will arise as the result
of porting among different target platforms.

> The fact that it's difficult to allocate over-aligned types does
> cause problems.

The C and C++ standards have limited support for such allocations
in the pending standards.  Expect the C++ support to get better
in the next round.  (The C++ standard was in part waiting on C,
which comes out later.)

> > Honestly, I don't think future-proofing against arbitrary new
> > atomic instructions really makes any sense.  Even going up to 16
> > bytes (on architectures where that can't be done lock-free now)
> > worries me a bit.
>
> I wouldn't really mind having the compiler produce an error for
> types it can't make lock-free. Then users can't use atomics of a
> size that would need an external library. If we do allow users
> to use larger atomics, then I don't think it's a good idea to
> guarantee the need for an ABI change when processors increase
> the maximum atomic size.

Errors would cause problems for folks porting code.

In any event, IMHO, planning for atomics of size 2*sizeof(void*)
is necessary.  I think it would be prudent to plan for the next
size up as well because (a) they probably won't be use all that
often so the actual cost to programs is low and (b) when hardware
support does come, you don't want to cripple that support because
of compatibility with a small set of uses.

> On Oct 12, 2011 Andrew MacLeod <amacleod at redhat.com> wrote:
> > for objects where it matters, we can probably detect alignment
> > after the fact by looking at the pointer value... you should
> > be able to tell if a 32 byte object pointer is pointing to a
> > 32 byte boundry or not.
>
> Yep, for some reason I forgot about that possibility. I think
> it'll even work in cases where the compiler has the same pointer
> in some contexts where it knows the alignment, and other contexts
> where it doesn't.

Yes.  That code must be carefully coordinated, so at a minimum it
needs to be specified in the ABI.

> On Oct 12, 2011 John McCall <rjmccall at apple.com> wrote:
> > On Oct 12, 2011, at 1:08 PM, Andrew MacLeod wrote:
> > > If the padding is under control of the 'atomic' keyword for
> > > the type, then we have complete control over those padding
> > > bytes. Regardless of what junk might be in them, they are
> > > part of the atomic data structure and the only way to access
> > > them is through a full atomic access. Its like adding another
> > > user field to the structure and not setting it. It shouldn't
> > > cause spurious failures.

> > Well, we at least have to make sure that our atomic operations
> > always zero-pad their operands.  For example, if we do an
> > atomic store into a 5-byte struct that we've padded to 8 bytes,
> > we have to make sure we store a zero pattern into the pad.
> > That's feasible, but it's complexity that we should at least
> > acknowledge before committing to it.
> >
> > I can also see this exposing lots of what are, admittedly,
> > source bugs, like only zero'ing the first sizeof(T) bytes of
> > an _Atomic(T).

The only atomic operation for padding is an issue is compare
exchange.  There are at least two implementation choices here.
First, you can pad on every operation.  Second, you can pad only on
compare exchange.  For the latter, you will need to copy out the pad
bits from the atomic value before initiating the atomic instruction.
I suspect that choice two will yield better overall performance,
but only some experiments will be definitive.

The C++ standard, section 29.6.5 paragraph 26 talks about the
distinction between the notional memcmp results and the equality
comparison results.  In essence, if you have padding, the strong
compare exchange becomes unhelpful.  The ABI should, at a minimum,
state whether the compare exchange is sensitive to that padding.

> You're thinking of code that tries to initialize an _Atomic(T)
> with memset(0, &at, sizeof(T))? I think code's only supposed
> to initialize _Atomic types with ATOMIC_VAR_INIT(value) or
> atomic_init(&at, value), so the source bug should be in the
> memset()'s pointer argument in addition to its size.

Do not use memset to initialize an atomic.  First, on IBM systems,
those types will have internal locks for which memset is entirely
inappropriate.  Second, atomic types are non-trivial and using
memset on a non-trivial type is undefined behavior.

-- 
Lawrence Crowl