[llvm-commits] [llvm] r137145 - /llvm/trunk/docs/Atomics.html

Wed Aug 10 10:05:47 PDT 2011

On Tue, Aug 9, 2011 at 2:07 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> +<!-- *********************************************************************** -->
> +<h2>
> +  <a name="ordering">Atomic orderings</a>
> +</h2>
> +<!-- *********************************************************************** -->
> +
> +<div>
> +
> +<p>In order to achieve a balance between performance and necessary guarantees,
> +   there are six levels of atomicity. They are listed in order of strength;

Do you want to mention that we've intentionally skipped
memory_order_consume for now?

> +   each level includes all the guarantees of the previous level except for
> +   Acquire/Release.</p>
> +
> +<p>Unordered is the lowest level of atomicity. It essentially guarantees that
> +   races produce somewhat sane results instead of having undefined behavior.
> +   This is intended to match the Java memory model for shared variables. It
> +   cannot be used for synchronization, but is useful for Java and other
> +   "safe" languages which need to guarantee that the generated code never
> +   exhibits undefined behavior.  Note that this guarantee is cheap on common
> +   platforms for loads of a native width, but can be expensive or unavailable
> +   for wider loads, like a 64-bit load on ARM. (A frontend for a "safe"
> +   language would normally split a 64-bit load on ARM into two 32-bit
> +   unordered loads.) In terms of the optimizer, this prohibits any
> +   transformation that transforms a single load into multiple loads,
> +   transforms a store into multiple stores, narrows a store, or stores a
> +   value which would not be stored otherwise.  Some examples of unsafe
> +   optimizations are narrowing an assignment into a bitfield, rematerializing
> +   a load, and turning loads and stores into a memcpy call. Reordering
> +   unordered operations is safe, though, and optimizers should take
> +   advantage of that because unordered operations are common in
> +   languages that need them.</p>
> +
> +<p>Monotonic is the weakest level of atomicity that can be used in
> +   synchronization primitives, although it does not provide any general
> +   synchronization. It essentially guarantees that if you take all the
> +   operations affecting a specific address, a consistent ordering exists.
> +   This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>; see
> +   those standards for the exact definition.  If you are writing a frontend, do
> +   not use the low-level synchronization primitives unless you are compiling
> +   a language which requires it or are sure a given pattern is correct. In
> +   terms of the optimizer, this can be treated as a read+write on the relevant
> +   memory location (and alias analysis will take advantage of that).  In
> +   addition, it is legal to reorder non-atomic and Unordered loads around
> +   Monotonic loads. CSE/DSE and a few other optimizations are allowed, but

It's also legal to reorder monotonic operations around each other as
long as you can prove they don't alias. (Think 'load a; load b; load
a" -> Normally it'd be fine to collapse the two  'load a's with no
aliasing check, but with monotonic atomics, you can only do that if
a!=b.)

> +   Monotonic operations are unlikely to be used in ways which would make
> +   those optimizations useful.</p>
> +
> +<p>Acquire provides a barrier of the sort necessary to acquire a lock to access
> +   other memory with normal loads and stores.  This corresponds to the
> +   C++0x/C1x <code>memory_order_acquire</code>.  This is a low-level
> +   synchronization primitive. In general, optimizers should treat this like
> +   a nothrow call.</p>
> +
> +<p>Release is similar to Acquire, but with a barrier of the sort necessary to
> +   release a lock.This corresponds to the C++0x/C1x
> +   <code>memory_order_release</code>.</p>

Did you want to say, "In general, optimizers should treat this like a
nothrow call." for Release too? Of course, optimizers might be able to
do better by knowing it's a release, but doing that would probably
take a lot more infrastructure work.

> +
> +<p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a Release barrier.
> +   This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>. In general,
> +   optimizers should treat this like a nothrow call.</p>
> +
> +<p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides Acquire and/or
> +   Release semantics, and in addition guarantees a total ordering exists with
> +   all other SequentiallyConsistent operations. This corresponds to the
> +   C++0x/C1x <code>memory_order_seq_cst</code>, and Java volatile.  The intent
> +   of this ordering level is to provide a programming model which is relatively
> +   easy to understand. In general, optimizers should treat this like a
> +   nothrow call.</p>
> +
> +</div>
> +
> +<!-- *********************************************************************** -->
> +<h2>
> +  <a name="otherinst">Other atomic instructions</a>
> +</h2>
> +<!-- *********************************************************************** -->
> +
> +<div>
> +
> +<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
> +   atomic load followed by an atomic store (where the store is conditional for
> +   <code>cmpxchg</code>), but no other memory operation operation can happen

duplicate "operation"

> +   between the load and store.</p>

Do you want to mention that we've intentionally skipped "weak"
cmpxchgs and cmpxchgs with different success and failure ordering
constraints?

> +<!-- *********************************************************************** -->
> +<h2>
> +  <a name="iropt">Atomics and IR optimization</a>
> +</h2>
> +<!-- *********************************************************************** -->
> +
> +<div>
> +
> +<p>Predicates for optimizer writers to query:
> +<ul>
> +  <li>isSimple(): A load or store which is not volatile or atomic.  This is
> +      what, for example, memcpyopt would check for operations it might
> +      transform.
> +  <li>isUnordered(): A load or store which is not volatile and at most
> +      Unordered. This would be checked, for example, by LICM before hoisting
> +      an operation.
> +  <li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
> +      that they returns true for any operation which is volatile or at least

s/returns/return/

> +      Monotonic.