[PATCH] Further relax the constraint on atomics in MemoryDependencyAnalysis.cpp

Thu Aug 21 18:07:40 PDT 2014

I will update this patch with the clearer comment/documentation ASAP.

Comments inline about the other points of discussion.

================
Comment at: lib/Analysis/MemoryDependenceAnalysis.cpp:440
@@ -423,3 +439,3 @@
             return MemDepResult::getClobber(LI);
         if (auto *QuerySI = dyn_cast<StoreInst>(QueryInst))
           if (!QuerySI->isSimple())
----------------
Philip Reames wrote:
> Robin Morisset wrote:
> > Philip Reames wrote:
> > > This is a bit off topic for this review, but I may have spotted a bug here.  What if the query instruction is a RMW operation?  These can have ordering semantics, but the existing code (from your previous change) would say there's no dependence.  
> > Indeed, thank you very much for find this.. it is incredible how hard to get right these things appear to be. I will send another patch shortly fixing it.
> > I would suggest making isAtomic()/isSimple() methods of all instructions (just return false/true respectively for the instructions that do not override it) and just checking that. In this way There would be much less risk in the future of forgetting one such case (there is also CmpXchg for example). Does this sound reasonable ?
> This approach seems reasonable to me.  
> 
> Alternately, you could use mayReadMemory() as your generic fallback check.  
mayReadOrWriteMemory seems like the perfect fallback indeed, I will send a patch using it.

================
Comment at: test/Transforms/DeadStoreElimination/atomic.ll:149
@@ +148,3 @@
+; CHECK: store i32 1
+  store i32 0, i32* @x
+  %x = load atomic i32* @y seq_cst, align 4
----------------
Philip Reames wrote:
> Robin Morisset wrote:
> > Philip Reames wrote:
> > > As currently specified in the LangRef, I don't think this is legal.  If I'm reading the spec right, the load must be considered both an Acquire and a Release.  (This is possibly not intended, but seems to follow from the wording.)  As a result, this is not a acquire/release pair, but both a acquire/release and release/acquire pair.  
> > > 
> > > Finding a case where this is observable would be hard, but it seems possible.  If need be, I'll spend some time thinking about it.  
> > In the LangRef, just under the first occurence of memory_order_seq_cst I see:
> > "In addition to the guarantees of acq_rel (acquire for an operation which only reads, release for an operation which only writes)"
> > It seems pretty clear to me that it follows the C++11 standard, and that a load cannot be considered a release, even if it is seq_cst.
> > If you see a case where this optimisation would be observable, I am **extremely** interested in it.
> Thanks for pointing out that wording.  I was looking in acq_rel.  Any chance you could move that to be under acq_rel?  It seems odd to have a later order defining a previous one.
> 
> For this particular example, it's only valid because we know @x and @y are distinct locations.  If we didn't know that, we couldn't remove the first store without changing %x, and thus the observable value.  
It is not under the acq_rel section, because it is impossible to have an acq_rel load or store, only RMW/CmpXchg operations can be acq_rel.

For the non-aliasing condition I agree, but that is a completely unrelated check: it also applies even if y was non-atomic (and is correctly checked by MemoryDependency).

================
Comment at: test/Transforms/GVN/atomic.ll:46
@@ -45,3 +45,3 @@
 
-; GVN across acquire load (load after atomic load must not be removed)
+; GVN across acquire load (allowed as the original load was not atomic)
 define i32 @test4() nounwind uwtable ssp {
----------------
Philip Reames wrote:
> Robin Morisset wrote:
> > Philip Reames wrote:
> > > I don't believe this is valid.
> > For this optimisation to be invalid, @y would have to change between the two loads.
> > Such a store to @y would necessarily race with the first load (to %x) and make the whole program undefined.
> > So my argument is that this is correct because it can only be observed by racy programs.
> I had to think about this one a bit.  :)  I can't find an argument to refute your racy program one, but I find that answer disturbing.  From a practical software engineering perspective, the answer "oh, we corrupted all of the runtime state in hard to explain ways because you had one race somewhere" is a bit hard to swallow.  I will admit it's what the c++11 standard says though.  
> 
> I'm curious, do you believe the answer changes if both loads are monotonic?  If not, this would seem to imply that LLVM can't implement the Java memory model correctly.  The JMM does specify limited semantics for racy programs.  
About the practical software engineering issue, it could maybe be mitigated by warnings (although I am not sure how exactly to give helpful warnings for this) and by the use of thread sanitizer. Do you think a discussion on LLVM-dev would be warranted about the topic of how aggressively/non-intuitively we may optimize code ? I admit I was focusing only on the correctness with regards to the standard and not necessarily with regards to the programmer's expectations. On the other hand, the programmer is not supposed to use atomics if he does not know what he's doing... (and he will probably have a bad time otherwise even if the compiler is not aggressively optimizing)

This specific argument does not work if the loads are monotonic (as races involving relaxed accesses are well-defined in C++11). About the question of whether it is true for another reason, I don't know.
We have a paper under review (we = Viktor Vafeiadis and Thibault Balabonski mostly) about what kind of optimizations are possible on atomics themselves (such as would be the case here if the loads are monotonic). I do not remember what it says about this specific situation (and don't have a copy of it with me right now), but we generally found that almost every optimization (even seemingly "obviously true" ones) break horrendously in horrifyingly subtle and evil ways as soon as they involve monotonic/relaxed accesses. So I plan to avoid messing with these kinds of accesses, at least until/unless their semantics is modified/fixed by the C++ committee.

Java accesses are "unordered" from what I understand. It is because I have no idea what their exact semantics is (especially in the LLVM framework based on C11) that I picked isSimple() in the previous patch (although I had forgotten them in the beginning).

http://reviews.llvm.org/D4845