[llvm-commits] [llvm] r137145 - /llvm/trunk/docs/Atomics.html

Wed Aug 10 13:17:14 PDT 2011

On Wed, Aug 10, 2011 at 10:05 AM, Jeffrey Yasskin <jyasskin at google.com> wrote:
> On Tue, Aug 9, 2011 at 2:07 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> +<!-- *********************************************************************** -->
>> +<h2>
>> +  <a name="ordering">Atomic orderings</a>
>> +</h2>
>> +<!-- *********************************************************************** -->
>> +
>> +<div>
>> +
>> +<p>In order to achieve a balance between performance and necessary guarantees,
>> +   there are six levels of atomicity. They are listed in order of strength;
>
> Do you want to mention that we've intentionally skipped
> memory_order_consume for now?

I'm adding a note to the discussion of Acquire.

>> +   each level includes all the guarantees of the previous level except for
>> +   Acquire/Release.</p>
>> +
>> +<p>Unordered is the lowest level of atomicity. It essentially guarantees that
>> +   races produce somewhat sane results instead of having undefined behavior.
>> +   This is intended to match the Java memory model for shared variables. It
>> +   cannot be used for synchronization, but is useful for Java and other
>> +   "safe" languages which need to guarantee that the generated code never
>> +   exhibits undefined behavior.  Note that this guarantee is cheap on common
>> +   platforms for loads of a native width, but can be expensive or unavailable
>> +   for wider loads, like a 64-bit load on ARM. (A frontend for a "safe"
>> +   language would normally split a 64-bit load on ARM into two 32-bit
>> +   unordered loads.) In terms of the optimizer, this prohibits any
>> +   transformation that transforms a single load into multiple loads,
>> +   transforms a store into multiple stores, narrows a store, or stores a
>> +   value which would not be stored otherwise.  Some examples of unsafe
>> +   optimizations are narrowing an assignment into a bitfield, rematerializing
>> +   a load, and turning loads and stores into a memcpy call. Reordering
>> +   unordered operations is safe, though, and optimizers should take
>> +   advantage of that because unordered operations are common in
>> +   languages that need them.</p>
>> +
>> +<p>Monotonic is the weakest level of atomicity that can be used in
>> +   synchronization primitives, although it does not provide any general
>> +   synchronization. It essentially guarantees that if you take all the
>> +   operations affecting a specific address, a consistent ordering exists.
>> +   This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>; see
>> +   those standards for the exact definition.  If you are writing a frontend, do
>> +   not use the low-level synchronization primitives unless you are compiling
>> +   a language which requires it or are sure a given pattern is correct. In
>> +   terms of the optimizer, this can be treated as a read+write on the relevant
>> +   memory location (and alias analysis will take advantage of that).  In
>> +   addition, it is legal to reorder non-atomic and Unordered loads around
>> +   Monotonic loads. CSE/DSE and a few other optimizations are allowed, but
>
> It's also legal to reorder monotonic operations around each other as
> long as you can prove they don't alias. (Think 'load a; load b; load
> a" -> Normally it'd be fine to collapse the two  'load a's with no
> aliasing check, but with monotonic atomics, you can only do that if
> a!=b.)

I though that was implied by "can be treated as a read+write on the relevant
memory location".

>> +   Monotonic operations are unlikely to be used in ways which would make
>> +   those optimizations useful.</p>
>> +
>> +<p>Acquire provides a barrier of the sort necessary to acquire a lock to access
>> +   other memory with normal loads and stores.  This corresponds to the
>> +   C++0x/C1x <code>memory_order_acquire</code>.  This is a low-level
>> +   synchronization primitive. In general, optimizers should treat this like
>> +   a nothrow call.</p>
>> +
>> +<p>Release is similar to Acquire, but with a barrier of the sort necessary to
>> +   release a lock.This corresponds to the C++0x/C1x
>> +   <code>memory_order_release</code>.</p>
>
> Did you want to say, "In general, optimizers should treat this like a
> nothrow call." for Release too? Of course, optimizers might be able to
> do better by knowing it's a release, but doing that would probably
> take a lot more infrastructure work.

Done.

>> +
>> +<p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a Release barrier.
>> +   This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>. In general,
>> +   optimizers should treat this like a nothrow call.</p>
>> +
>> +<p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides Acquire and/or
>> +   Release semantics, and in addition guarantees a total ordering exists with
>> +   all other SequentiallyConsistent operations. This corresponds to the
>> +   C++0x/C1x <code>memory_order_seq_cst</code>, and Java volatile.  The intent
>> +   of this ordering level is to provide a programming model which is relatively
>> +   easy to understand. In general, optimizers should treat this like a
>> +   nothrow call.</p>
>> +
>> +</div>
>> +
>> +<!-- *********************************************************************** -->
>> +<h2>
>> +  <a name="otherinst">Other atomic instructions</a>
>> +</h2>
>> +<!-- *********************************************************************** -->
>> +
>> +<div>
>> +
>> +<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
>> +   atomic load followed by an atomic store (where the store is conditional for
>> +   <code>cmpxchg</code>), but no other memory operation operation can happen
>
> duplicate "operation"
>
>> +   between the load and store.</p>
>
> Do you want to mention that we've intentionally skipped "weak"
> cmpxchgs and cmpxchgs with different success and failure ordering
> constraints?

Yes; added a note.

>> +<!-- *********************************************************************** -->
>> +<h2>
>> +  <a name="iropt">Atomics and IR optimization</a>
>> +</h2>
>> +<!-- *********************************************************************** -->
>> +
>> +<div>
>> +
>> +<p>Predicates for optimizer writers to query:
>> +<ul>
>> +  <li>isSimple(): A load or store which is not volatile or atomic.  This is
>> +      what, for example, memcpyopt would check for operations it might
>> +      transform.
>> +  <li>isUnordered(): A load or store which is not volatile and at most
>> +      Unordered. This would be checked, for example, by LICM before hoisting
>> +      an operation.
>> +  <li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
>> +      that they returns true for any operation which is volatile or at least
>
> s/returns/return/

Done.

-Eli