[PATCH] [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.

Mon Dec 1 10:44:16 PST 2014

================
Comment at: include/clang/Basic/TargetInfo.h:379
@@ +378,3 @@
+           AtomicSizeInBits <= getMaxAtomicInlineWidth() &&
+           llvm::isPowerOf2_64(AtomicSizeInBits / getTypeWidth(SignedChar));
+  }
----------------
That's just getCharWidth() but more expensive.

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:603
@@ +602,3 @@
+  switch (Kind) {
+  case OMPC_read: {
+    // v = x;
----------------
I think you missed this comment from my review — please make the OMPC_read case its own function.

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:591
@@ +590,3 @@
+      llvm::Value *ScalarVal;
+      switch (CGF.getEvaluationKind(X->getType())) {
+      case TEK_Scalar:
----------------
ABataev wrote:
> rjmccall wrote:
> > I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this:
> >   llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType());
> > 
> > It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.
> 1. Agree, I'll rework this.
> 2. Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops.
> 3. We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).
You can't mix locking and non-locking atomics on the same object:  they'll be atomic with respect to themselves, but not with respect to each other.  That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads.

e.g. imagine that you have a *= that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock.  The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the *= rather than appearing to execute before or after.

In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly.  (You could do *some* operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.)  That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless.  The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values.

This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations.   The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars.  But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.

http://reviews.llvm.org/D6431