[cfe-commits] [PATCH] atomic operation builtins, part 1

Fri Oct 14 15:03:56 PDT 2011

Resending due to list auto-discard.

On 10/13/11, Lawrence Crowl <crowl at google.com> wrote:
> On 10/12/11, John McCall <rjmccall at apple.com> wrote:
>> On Oct 12, 2011, at 7:37 PM, Lawrence Crowl wrote:
>> > Applications drive architecture extension, not the reverse.
>> > What applications will want to use larger atomics now that there
>> > are languages for specifying them?  We have obvious need for atomic
>> > types that are twice the size of a pointer.  We do not presently
>> > have a compelling case to implement even larger types efficiently.
>> > We may have that evidence in the future.  However, given ABI
>> > compatibility constraints, we must choose forever very soon.
>>
>> Yes, I understand.  I think you and Jeffrey Yasskin are approaching
>> this from the perspective that nobody will ever actually use these
>> types when they're not lockless, and I am assuming that of course
>> some people will, because they're attracted by the word "atomic",
>> and they'll do so with full knowledge that a cheap lock might be
>> involved, and those people will be very surprised to find out that
>> the implementation is substantially less efficient than possible
>> simply because we wanted to future-proof against a grossly
>> improbable outcome.
>
> We are weighing the probability of various events differently.
>
>> > > We're not talking about a radically different ABI.  We're talking
>> > > about basically -msse.
>> >
>> > No, we are not.  The -msse flag does not affect the layout of doubles
>> > in memory.  How successful would -msse have been if it required
>> > changing the size of a double and recompiling every library that
>> > went into the program?  Every device driver?  Not very.
>>
>> You're right;  my example was poorly chosen.  Of course -msse is
>> purely local to a translation unit.
>>
>> Across GCC and Clang, there are dozens of examples of
>> command-line options which do impact the ABI, all of them much
>> more invasively than this.  If I were designing this properly, though,
>> I would suggest some sort of attribute((lockless)) to opt in a specific
>> object to a variant lockless implementation.
>
> True, but such facilities tend to get used only infrequently,
> so the performance of the defaults matter (both ways).
>
>> > > Bloating structures to meet 64-byte alignment requirements also
>> > > has a real impact on system performance.
>> >
>> > True enough.  However, I expect the performance tradeoff to weigh
>> > in favor of reducing synchronization costs over saving space for
>> > atomics.
>>
>> I'm not sure I would agree even if I saw any likelihood whatsoever
>> of these performance gains actually being realized.  We live in an
>> era where memory is precious and CPU cycles are cheap, and
>> that doesn't seem to be changing.
>
> Actually, the memory itself is cheap.  What is expensive is
> the memory cycles.  Atomics consume those as well as regular
> memory accesses.  The problem we have here is that the number of
> memory cycles spent on contention can be very high.  Unfortunately,
> that cost tends not to show up in any static analysis of the code.
> My working assumption is that programmers will move from locks to
> atomics when they have a contention/performance problem.  I would
> like to have some performance there for them to get!
>
> --
> Lawrence Crowl
>

-- 
Lawrence Crowl