[PATCH] [OPENMP] CodeGen for "omp atomic read [seq_cst]" directive.

Tue Dec 2 09:24:23 PST 2014

================
Comment at: lib/CodeGen/CGStmtOpenMP.cpp:591
@@ +590,3 @@
+      llvm::Value *ScalarVal;
+      switch (CGF.getEvaluationKind(X->getType())) {
+      case TEK_Scalar:
----------------
ABataev wrote:
> rjmccall wrote:
> > ABataev wrote:
> > > rjmccall wrote:
> > > > I think you'll want this functionality for some of the other atomic ops anyway, so you might as well make this a separate function that you can call like this:
> > > >   llvm::Value *ScalarVal = convertToScalarValue(CGF, Res, X->getType(), V->getType());
> > > > 
> > > > It's kindof a shame that you have to redo all of this instead of just relying on the implicit casts that Sema already created when it analyzed the expression, but maybe this is better than messing around with OpaqueValueExprs, and it works because you don't have a requirement to handle custom operators.
> > > 1. Agree, I'll rework this.
> > > 2. Yes, I tried to implement it using some hooks/hacks in AST, but there was a lot of troubles with atomic ops.
> > > 3. We will have to support some custom operators in future (in OpenMP 4.0 reductions introduces user-defined reductions, which must be called as atomic ops and I plan to use global locks for them).
> > You can't mix locking and non-locking atomics on the same object:  they'll be atomic with respect to themselves, but not with respect to each other.  That is, assuming that atomic_start and atomic_end aren't implemented by halting all other OpenMP threads.
> > 
> > e.g. imagine that you have a *= that's implemented with a compare-and-swap loop and a custom reduction that you've implemented with the global lock.  The global lock doesn't keep the compare-and-swap from firing during the execution of the custom reduction, so (1) different parts of the custom reduction might see different values for the original l-value and (2) the custom reduction will completely overwrite the *= rather than appearing to execute before or after.
> > 
> > In fact, you have a similar problem with aggregates, where an atomic access to an aggregate (e.g. a std::pair<float,float>) cannot be made atomic with respect to an atomic access to a subobject unless the aggregate will be accessed locklessly.  (You could do *some* operations locklessly if you wrote them very carefully, e.g. reads and writes, but I don't know of any way to do an aggregate compare-and-swap that's atomic with a subobject compare-and-swap when the first requires a lock and the second doesn't.)  That's mostly defined away by the current requirement that the l-value have scalar type, except that you do support _Complex l-values right now, and 32-bit platforms generally can't make _Complex double lockless.  The reverse problem applies to vectors: a 16-byte vector will generally be sufficiently aligned that a 64-bit platform can access it locklessly, but you're implementing some accesses to subobjects with locks: specifically, vector projections that create compound l-values.
> > 
> > This is really a specification problem: OpenMP wants simple operations to be implementable with native atomics, but that really restricts the set of operations you can do unless you can assume at a high level that there are never partially overlapping atomic operations.   The easiest language solution is to say that, for the purposes of OpenMP's atomics, _Complex and vector types don't count as scalars.  But I don't know if that would fly — it might be a goal to loosen the restrictions about what types you can use in atomic operations.
> I agree with you, currently the code is not quite correct. I think we can resolve this problem by emitting OpenMP specific locks for target supported lockfree atomic operations. But in this case we don't need target atomic ops at all, we can rely on runtime library interface only.
You mean, always grabbing a lock instead of ever using lock-free operations?  I agree that that's the most general solution, but it's really heavy-handed.

It's possible that you can still do user-defined reductions with a compare-and-swap loop, depending on how they're defined. (Specifically: you have to be allowed to try the reduction multiple times, and the reduction's only allowed to access a single atomic value.  The existing specification about the #pragma tries to enforce those conditions even for the simple cases allowed there, so I'm optimistic that whoever is writing your spec is at least thinking about these problems.)

What you really need here is a statement about subobjects.  The C11/C++11 atomic types are very nice because they tell you exactly at what level something is atomic: you can have an std::atomic<std::pair<int,int>>, but that type doesn't provide accessors for atomically accessing the individual members of the pair; you can't just project out a std::atomic<int>& from the first element.  That's really important, because it tells you locally that all atomic accesses are going to reference the full, 64-byte aggregate, which means you can agree very easily on a protocol for that access.

OpenMP doesn't tie into the type system that way, but you can still impose that rule at a high level by saying that it's undefined behavior for atomic accesses to an aggregate to race with atomic accesses to a subobject.  For example, you can have an atomic operation on a _Complex float, and you can have a simultaneous atomic operation on a float, but the fact that they're simultaneous means that we can assume they don't alias.  That still lets us decide on the atomic access pattern based purely on the type of the atomic l-value.

http://reviews.llvm.org/D6431