[cfe-commits] [PATCH] atomic operation builtins, part 1

Fri Feb 24 18:24:02 PST 2012

After some internal discussion within intel, here is what we would recommend for atomic objects:

0:    The sizes in subsequent statements refer to the size of the object including any fields to implement atomicity. (but per Jeffrey, no current implementation is considering the addition of extra fields)
1.	For atomic objects of size < 32, pad the size if necessary so that it is a power of 2.
2.	For sizes 2, 4, 8, 16, 32 align the object to the size of the object.
3.	For size > 32, pad the object so that size is a multiple of 64 bytes.
4.	For size > 32, align the object to a 64-byte boundary.

We think this would work well for most current and future Intel processors - although it is hard to make a blanket recommendation for all applications because the wastage of memory due to padding/alignment has to be weighed against the possible contention between threads. 

This does make the size/alignment of the atomic type different from the unqualified type - can somebody highlight the disadvantages due to this difference?

Also, is there any effort going on to implement the C _Atomic in gcc?

- Milind

-----Original Message-----
From: Jeffrey Yasskin [mailto:jyasskin at googlers.com] 
Sent: Friday, February 24, 2012 1:52 PM
To: John McCall; Girkar, Milind
Cc: Andrew MacLeod; cfe-commits at cs.uiuc.edu Commits; Lawrence Crowl
Subject: Re: [cfe-commits] [PATCH] atomic operation builtins, part 1

I managed to get in touch with some folks at Intel to get their
thoughts on this. Milind has some recommendations.

On Sat, Feb 11, 2012 at 11:53 PM, Jeffrey Yasskin <jyasskin at googlers.com> wrote:
> On Wed, Oct 12, 2011 at 11:55 AM, Jeffrey Yasskin <jyasskin at google.com> wrote:
>> [+ Lawrence who's been driving the ABI-compatibility design. Context
>> at http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047614.html]
>>
>> On Wed, Oct 12, 2011 at 10:57 AM, John McCall <rjmccall at apple.com> wrote:
>>> On Oct 12, 2011, at 9:03 AM, Jeffrey Yasskin wrote:
>>>> On Wed, Oct 12, 2011 at 6:31 AM, Andrew MacLeod <amacleod at redhat.com> wrote:
>>>>> - language atomic types up to 16 bytes should be padded to an appropriate
>>>>> size, and aligned properly.
>>>>> - if memory matching one of the 5 'optimized' sizes isn't aligned properly,
>>>>> results are undefined.
>>>>> - if the size does not match one of the 5 specific routines, then the
>>>>> library generic ABI can handle it.  There's no alignment guarantees, so I
>>>>> presume it would end up being a locked implementation using hash tables and
>>>>> addresses or something.
>>>>
>>>> The ABI library needs to demand alignment guarantees, or have them
>>>> passed in, or it won't be able to support larger lock-free sizes on
>>>> new architectures.
>>>
>>> How aggressive are you suggesting we be about this?  If I make this type atomic:
>>>  struct { float values[5]; };
>>> do we really increase its size and alignment up to 32 bytes in the wild hope that the architecture will add 32-byte atomics someday?  If so, what's the limit?  If not, why is 16 the limit?
>>>
>>
>> The goal was that architectures could add new atomic instructions
>> without forcing an ABI change. Changing the size of atomic<FiveFloats>
>> would be an ABI change, so we should try to plan ahead to avoid it.
>> All the existing atomics have required alignments equal to their
>> sizes, and whole-cacheline cmpxchg seems like a plausible future
>> instruction and would also require alignment equal to the size, so
>> that's what I've been suggesting.
>
> I think the recent announcement at
> http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/,
> that Intel plans to implement hardware transactions by making locked
> regions cheaper, undermines my and Lawrence's position here. If these
> new instructions work like they appear to, it'll be possible to
> implement types with arbitrary sizes and alignments as cheaply as the
> current lock-free operations, and it seems unlikely to me that Intel
> would add larger lock-free operations once they have these
> transactional instructions.
>
> Jeffrey