[cfe-commits] [PATCH] atomic operation builtins, part 1

Mon Feb 13 17:58:57 PST 2012

On Mon, Feb 13, 2012 at 5:36 PM, Lawrence Crowl <crowl at google.com> wrote:
> On 2/11/12, Jeffrey Yasskin <jyasskin at googlers.com> wrote:
>> On Wed, Oct 12, 2011 at 11:55 AM, Jeffrey Yasskin <jyasskin at google.com> wrote:
>>> [+ Lawrence who's been driving the ABI-compatibility design. Context
>>> at
>>> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20111010/047614.html]
>>>
>>> On Wed, Oct 12, 2011 at 10:57 AM, John McCall <rjmccall at apple.com> wrote:
>>>> On Oct 12, 2011, at 9:03 AM, Jeffrey Yasskin wrote:
>>>>> On Wed, Oct 12, 2011 at 6:31 AM, Andrew MacLeod <amacleod at redhat.com>
>>>>> wrote:
>>>>>> - language atomic types up to 16 bytes should be padded to an
>>>>>> appropriate
>>>>>> size, and aligned properly.
>>>>>> - if memory matching one of the 5 'optimized' sizes isn't aligned
>>>>>> properly,
>>>>>> results are undefined.
>>>>>> - if the size does not match one of the 5 specific routines, then the
>>>>>> library generic ABI can handle it.  There's no alignment guarantees, so
>>>>>> I
>>>>>> presume it would end up being a locked implementation using hash tables
>>>>>> and
>>>>>> addresses or something.
>>>>>
>>>>> The ABI library needs to demand alignment guarantees, or have them
>>>>> passed in, or it won't be able to support larger lock-free sizes on
>>>>> new architectures.
>>>>
>>>> How aggressive are you suggesting we be about this?  If I make this type
>>>> atomic:
>>>>  struct { float values[5]; };
>>>> do we really increase its size and alignment up to 32 bytes in the wild
>>>> hope that the architecture will add 32-byte atomics someday?  If so,
>>>> what's the limit?  If not, why is 16 the limit?
>>>>
>>>
>>> The goal was that architectures could add new atomic instructions
>>> without forcing an ABI change. Changing the size of atomic<FiveFloats>
>>> would be an ABI change, so we should try to plan ahead to avoid it.
>>> All the existing atomics have required alignments equal to their
>>> sizes, and whole-cacheline cmpxchg seems like a plausible future
>>> instruction and would also require alignment equal to the size, so
>>> that's what I've been suggesting.
>>
>> I think the recent announcement at
>> http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/,
>> that Intel plans to implement hardware transactions by making locked
>> regions cheaper, undermines my and Lawrence's position here. If these
>> new instructions work like they appear to, it'll be possible to
>> implement types with arbitrary sizes and alignments as cheaply as the
>> current lock-free operations, and it seems unlikely to me that Intel
>> would add larger lock-free operations once they have these
>> transactional instructions.
>
> My guess is that they are exploiting cache line ownership.  I expect
> there is a limit on the number of lines, but not small enough to
> affect 'reasonable' atomic types.
>
> Crossing a cache boundary will require holding both lines.  If there
> is any false sharing on those lines, the performance could suffer
> badly.  One advantage to super-aligning is that the probability of
> false sharing goes down.
>
> I suppose we could pass that problem back to the user, which in
> general they must deal with anyway.  However, there is presently
> no C++ standard mechanism to respect cache line size and alignment.
> Forcing a bunch of platform-dependent code to address the performance
> doesn't seem like a good thing to do.  Standardizing cache line
> size queries seems like a good way to unproductively spend lots of
> committee time.  Grumble.

I think we have to make the user deal with the false-sharing problem
rather than guessing at the solution they want in the _Atomic
implementation. The committee's going to have to talk about aligned
allocation at some point anyway, since aligned_alloc() is only in C11,
not C++11, and operator new currently ignores the alignment of types
it's allocating. And since cache-line-aligned allocation is important
even for non-atomic datastructures, I think it'll be important to find
some interface for it.

Jeffrey