[LLVMdev] Reviving the new LLVM concurrency model

Mon Jul 18 17:22:19 PDT 2011

There was some discussion a while back about adding a C++0x-style
memory model and atomics for LLVM a while back
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but
it got stalled.  I'm going to try and restart progress on it.

Attached are two patches; the first adds a section to LangRef with
just the memory model, without directly changing the documentation or
implementation of atomic intrinsics.  This mostly comes from
https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest,
but it's been modified a bit, so please look at the attached version.
The second fixes the one place I know of where LLVM violates that
proposed memory model.

I would appreciate any reviews (primarily for the LangRef bits; I'm
reasonably confident the patch to LICM is correct).

There was some discussion about allowing targets to implement
unaligned stores by widening, but I've left that out.  No current
backend uses such an implementation, so there isn't a substantial
benefit, and allowing it would break some current optimizations (like
transforming a memset to an unaligned store).  See also the FIXME in
the LangRef patch about architectures without a proper single-byte
store instruction.

-Eli
-------------- next part --------------
Index: docs/LangRef.html
===================================================================

--- docs/LangRef.html	(revision 135415)
+++ docs/LangRef.html	(working copy)
@@ -53,6 +53,7 @@
       <li><a href="#datalayout">Data Layout</a></li>
       <li><a href="#pointeraliasing">Pointer Aliasing Rules</a></li>
       <li><a href="#volatile">Volatile Memory Accesses</a></li>
+      <li><a href="#memmodel">Memory Model for Concurrent Operations</a></li>
     </ol>
   </li>
   <li><a href="#typesystem">Type System</a>
@@ -1470,8 +1471,93 @@
 
 </div>
 
+<!-- ======================================================================= -->
+<h3>
+  <a name="memmodel">Memory Model for Concurrent Operations</a>
+</h3>
+
+<div>
+
+<p>The LLVM IR does not define any way to start parallel threads of execution
+or to register signal handlers. Nonetheless, there are platform-specific
+ways to create them, and we define LLVM IR's behavior in their presence. This
+model is inspired by the C++0x memory model.</p>
+
+<p>We define a <i>happens-before</i> partial order as the least partial order
+that</p>
+<ul>
+  <li>Is a superset of single-thread program order, and</li>
+  <li>When a <i>synchronizes-with</i> <tt>b</tt>, includes an edge from
+      <tt>a</tt> to <tt>b</tt>. <i>Synchronizes-with</i> pairs are introduced
+      by platform-specific techniques, like pthread locks, thread
+      creation, thread joining, etc., and by the atomic operations described
+      in the <a href="#int_atomics">Atomic intrinsics</a> section.</li>
+</ul>
+
+<p>Note that program order does not introduce <i>happens-before</i> edges
+between a thread and signals executing inside that thread.</p>
+
+<p>Every (defined) read operation (load instructions, memcpy, atomic
+loads/read-modify-writes, etc.) <var>R</var> reads a series of bytes written by
+(defined) write operations (store instructions, atomic
+stores/read-modify-writes, memcpy, etc.). For each byte, <var>R</var> reads the
+value written by some write that it <i>may see</i>, given any relevant
+<i>happens-before</i> constraints.  <var>R<sub>byte</sub></var> may
+see any write to the same byte, except:</p>
+
+<ul>
+  <li>If <var>write<sub>1</sub></var> happens before
+      <var>write<sub>2</sub></var>, and <var>write<sub>2</sub></var> happens
+      before <var>R<sub>byte</sub></var>, then <var>R<sub>byte</sub></var>
+      must not see <var>write<sub>1</sub></var>.
+  <li>If <var>R<sub>byte</sub></var> happens before <var>write<sub>3</var>,
+      then <var>R<sub>byte</sub></var> must not see
+      <var>write<sub>3</sub></var>.
+</ul>
+
+<p>Given that definition, <var>R<sub>byte</sub></var> is defined as follows:
+<ul>
+  <li>If there is no write to the same byte that happens before
+    <var>R<sub>byte</sub></var>, <var>R<sub>byte</sub></var> returns 
+    <tt>undef</tt> for that byte.
+  <li>If <var>R<sub>byte</sub></var> may see exactly one write,
+      <var>R<sub>byte</sub></var> returns the value written by that
+      write.</li>
+  <li>If <var>R<sub>byte</sub></var> and all the writes it may see are
+      atomic, it chooses one of those writes and returns it value.
+      Given any two bytes in a given read <var>R</var>, if the set of
+      writes <var>R<sub>byte</sub></var> may see is the same as the set
+      of writes another byte may see, they will both choose the same write.
+  <li>Otherwise <var>R<sub>byte</sub></var> returns <tt>undef</tt>.</li>
+</ul>
+
+<p><var>R</var> returns the value composed of the series of bytes it read.
+This implies that some bytes within the value may be <tt>undef</tt>
+<b>without</b> the entire value being <tt>undef</tt>. Note that this only
+defines the semantics of the operation; it doesn't mean that targets will
+emit more than one instruction to read the series of bytes.</p>
+
+<p>Note that in cases where none of the atomic intrinsics are used, this model
+places only one restriction on IR transformations on top of what is required
+for single-threaded execution: introducing a store to a byte which might not
+otherwise be stored to can introduce undefined behavior.</p>
+
+<!-- FIXME: This model assumes all targets where concurrency is relevant have
+a byte-size store which doesn't affect adjacent bytes.  As far as I can tell,
+none of the backends currently in the tree fall into this category; however,
+there might be targets which care.  If there are, we want a paragraph
+like the following:
+
+Targets may specify that stores narrower than a certain width are not
+available; on such a target, for the purposes of this model, treat any non-atomic
+write with an alignment or width less than the minimum width as if it writes
+to the relevant surrounding bytes.
+-->
+
 </div>
 
+</div>
+
 <!-- *********************************************************************** -->
 <h2><a name="typesystem">Type System</a></h2>
 <!-- *********************************************************************** -->
-------------- next part --------------
Index: test/Transforms/LICM/scalar-promote-memmodel.ll
===================================================================
--- test/Transforms/LICM/scalar-promote-memmodel.ll	(revision 0)
+++ test/Transforms/LICM/scalar-promote-memmodel.ll	(revision 0)
@@ -0,0 +1,37 @@
+; RUN: opt < %s -basicaa -licm -S | FileCheck %s
+
+; Make sure we don't hoist a conditionally-executed store out of the loop;
+; it would violate the concurrency memory model
+
+ at g = common global i32 0, align 4
+
+define void @bar(i32 %n, i32 %b) nounwind uwtable ssp {
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %for.inc, %entry
+  %i.0 = phi i32 [ 0, %entry ], [ %inc5, %for.inc ]
+  %cmp = icmp slt i32 %i.0, %n
+  br i1 %cmp, label %for.body, label %for.end
+
+for.body:                                         ; preds = %for.cond
+  %tobool = icmp eq i32 %b, 0
+  br i1 %tobool, label %for.inc, label %if.then
+
+if.then:                                          ; preds = %for.body
+  %tmp3 = load i32* @g, align 4
+  %inc = add nsw i32 %tmp3, 1
+  store i32 %inc, i32* @g, align 4
+  br label %for.inc
+
+; CHECK: load i32*
+; CHECK-NEXT: add
+; CHECK-NEXT: store i32
+
+for.inc:                                          ; preds = %for.body, %if.then
+  %inc5 = add nsw i32 %i.0, 1
+  br label %for.cond
+
+for.end:                                          ; preds = %for.cond
+  ret void
+}
Index: lib/Transforms/Scalar/LICM.cpp
===================================================================
--- lib/Transforms/Scalar/LICM.cpp	(revision 135415)
+++ lib/Transforms/Scalar/LICM.cpp	(working copy)
@@ -151,6 +151,11 @@
     ///
     bool isSafeToExecuteUnconditionally(Instruction &I);
 
+    /// isGuaranteedToExecute - Check that the instruction is guaranteed to
+    /// execute.
+    ///
+    bool isGuaranteedToExecute(Instruction &I);
+
     /// pointerInvalidatedByLoop - Return true if the body of this loop may
     /// store into the memory location pointed to by V.
     ///
@@ -577,6 +582,10 @@
   if (Inst.isSafeToSpeculativelyExecute())
     return true;
 
+  return isGuaranteedToExecute(Inst);
+}
+
+bool LICM::isGuaranteedToExecute(Instruction &Inst) {
   // Otherwise we have to check to make sure that the instruction dominates all
   // of the exit blocks.  If it doesn't, then there is a path out of the loop
   // which does not execute this instruction, so we can't hoist it.
@@ -713,34 +722,37 @@
 
       // If there is an non-load/store instruction in the loop, we can't promote
       // it.
-      unsigned InstAlignment;
-      if (LoadInst *load = dyn_cast<LoadInst>(Use)) {
+      if (isa<LoadInst>(Use)) {
         assert(!cast<LoadInst>(Use)->isVolatile() && "AST broken");
-        InstAlignment = load->getAlignment();
       } else if (StoreInst *store = dyn_cast<StoreInst>(Use)) {
         // Stores *of* the pointer are not interesting, only stores *to* the
         // pointer.
         if (Use->getOperand(1) != ASIV)
           continue;
-        InstAlignment = store->getAlignment();
+        unsigned InstAlignment = store->getAlignment();
         assert(!cast<StoreInst>(Use)->isVolatile() && "AST broken");
+
+        // Note that we only check GuaranteedToExecute inside the store case
+        // so that we do not introduce stores where they did not exist before
+        // (which would break the LLVM concurrency model).
+
+        // If the alignment of this instruction allows us to specify a more
+        // restrictive (and performant) alignment and if we are sure this
+        // instruction will be executed, update the alignment.
+        // Larger is better, with the exception of 0 being the best alignment.
+        if ((InstAlignment > Alignment || InstAlignment == 0)
+            && (Alignment != 0))
+          if (isGuaranteedToExecute(*Use)) {
+            GuaranteedToExecute = true;
+            Alignment = InstAlignment;
+          }
+
+        if (!GuaranteedToExecute)
+          GuaranteedToExecute = isGuaranteedToExecute(*Use);
+
       } else
         return; // Not a load or store.
 
-      // If the alignment of this instruction allows us to specify a more
-      // restrictive (and performant) alignment and if we are sure this
-      // instruction will be executed, update the alignment.
-      // Larger is better, with the exception of 0 being the best alignment.
-      if ((InstAlignment > Alignment || InstAlignment == 0)
-          && (Alignment != 0))
-        if (isSafeToExecuteUnconditionally(*Use)) {
-          GuaranteedToExecute = true;
-          Alignment = InstAlignment;
-        }
-
-      if (!GuaranteedToExecute)
-        GuaranteedToExecute = isSafeToExecuteUnconditionally(*Use);
-
       LoopUses.push_back(Use);
     }
   }