[cfe-commits] [PATCH] atomic operation builtins, part 1

Fri Oct 14 15:02:54 PDT 2011

Resending due to list auto-discard.

On 10/12/11, Lawrence Crowl <crowl at google.com> wrote:
> On 10/12/11, John McCall <rjmccall at apple.com> wrote:
>> On Oct 12, 2011, at 6:50 PM, Lawrence Crowl wrote:
>> > On 10/12/11, John McCall <rjmccall at apple.com> wrote:
>> > > So you want to pessimize code using large atomics just in case
>> > > it's eventually deployed to a theoretical future processor
>> > > which has burnt hardware on a use case that you've just
>> > > stipulated that you can't imagine a use for.
>> >
>> > What is the problem with pessimizing code that doesn't currently
>> > have a use?
>>
>> It's absurd to optimize based on the unsubstantiated idea that
>> future hardware is going to including instructions for something
>> that we don't have a use for.  x86-32 doesn't even have 16-byte
>> atomics.  In fact, what significant 32-bit architecture does,
>> or even plans to?
>
> Applications drive architecture extension, not the reverse.
> What applications will want to use larger atomics now that there
> are languages for specifying them?  We have obvious need for atomic
> types that are twice the size of a pointer.  We do not presently
> have a compelling case to implement even larger types efficiently.
> We may have that evidence in the future.  However, given ABI
> compatibility constraints, we must choose forever very soon.
>
>> > > I think anyone who urgently needs to take advantage of 64-byte
>> > > atomics can opt into a lock-free implementation of them by
>> > > requesting a variant ABI.
>> >
>> > Variant ABIs are expensive.  The x32 psABI has obvious and
>> > broad performance benefits, and yet it is only now getting any
>> > traction, near 10 years after AMD64.  I have confidence that
>> > there will be no widely used variant ABI for large atomics.
>>
>> We're not talking about a radically different ABI.  We're talking
>> about basically -msse.
>
> No, we are not.  The -msse flag does not affect the layout of doubles
> in memory.  How successful would -msse have been if it required
> changing the size of a double and recompiling every library that
> went into the program?  Every device driver?  Not very.
>
> I can implement floating point operations in many different ways
> without affecting the ABI because only the memory layout affects
> the ABI of floats.  The issue under discussion is the memory layout
> of atomics.  Any decisions made in the near term are likely to
> be irrevokable.
>
> Atomics are different from floating point in another crucial
> respect.  The ABI compatiblity depends not only on the layout,
> but on the synchronization protocol used to access them.  If two
> atomic operations are implemented with different protocols,
> the synchronization will fail and the application will fail in
> mysterious ways.  I have participated in just such a mistake and
> it was not pretty.
>
> A consequence of avoiding the mistake is that one needs to have
> the system provide a single implementation of all potential
> non-lock-free atomics.  Effectively, that means a dynamic library
> provided to the applications by the system.
>
>> > > > If we do allow users to use larger atomics, then I don't
>> > > > think it's a good idea to guarantee the need for an ABI
>> > > > change when processors increase the maximum atomic size.
>> > >
>> > > Suddenly introducing a need for huge alignments into
>> > > environments that don't support them is much more worrying
>> > > to me than a small number of programmers needing to manually
>> > > request an optimization on weird code.
>> >
>> > Atomics often tend to be used in environments where performance
>> > is critical.  It may be wierd code, and rare code, but it's
>> > the code that can make or break system performance.
>>
>> Great.  They can use -matomic-cachelines or, better yet,
>> _Atomic(int512_t).
>
> They can do that only if they coordinate in advance with other
> users of those same structures.
>
>> Bloating structures to meet 64-byte alignment requirements also
>> has a real impact on system performance.
>
> True enough.  However, I expect the performance tradeoff to weigh
> in favor of reducing synchronization costs over saving space for
> atomics.  After all, one uses atomics because one has a run-time
> problem.  If not, mutexes are a safer technology.
>
> --
> Lawrence Crowl
>

-- 
Lawrence Crowl