[PATCH] D15352: [EarlyCSE] DSE of atomic unordered stores

Thu Dec 10 12:54:58 PST 2015

jfb added inline comments.

================
Comment at: test/Transforms/EarlyCSE/atomics.ll:131
@@ +130,3 @@
+  store atomic i32 3, i32* %P1 unordered, align 4
+  store i32 0, i32* %P1, align 4
+  ret void
----------------
reames wrote:
> jfb wrote:
> > I think the store you keep still needs to be atomic, i.e. you're "sinking" the atomicity to all post-dominating stores. Let's ignore that this is `unordered` (silly example) and imagine it's a `store release`: in that case the writer could have been writing a non-zero value, then racing with itself and writing another non-zero value to the same location. The reader observes the transition to non-zero (yes it's racy *which* non-zero is seen, but one is seen). If you drop `release` then the algorithm becomes incorrect.
> > 
> > ```
> > void writer(Payload p, atomic<int> *flag) {
> >   p->update();
> >   flag->write(1, release);
> >   // ...
> >   *reinterpret_cast<int*>(flag) = 2; // WTF?
> > }
> > void reader(Payload p, atomic<int> *flag) {
> >   while (!flag->load(acquire)) ;
> >   p->process();
> > }
> > ```
> > 
> > In the above example `p->process()` isn't guaranteed to see the latest state from `p`. Yes the code is already UB, but we don't need to be silly.
> > 
> > Seen another way, replace `reinterpret_cast` with a `flag->store(2, relaxed)` and then we're well-defined, you can eliminate the `store release`, but you have to "sink" the `release` to the later store.
> Two problems with this:
> 1) Your use of a release store is substantially different than what's being modified and tested here.  In particular, a release store implies ordering where an unordered atomic does not.  As a result, the basis for your example collapses.  (If we were implementing ordered atomics here, it would be a problem.)  -- Note I actually agree with your point that we'd have to keep the release store in the modified example.  I believe I even said that in a comment in the code.
> 2) The execution of the non-atomic store to a racy location is undefined.  You are *not* guaranteed that it publishes a only non-zero value.  Consider the constant value 0x01 being stored over the value 0x10.  Pretend for a moment we see bit level tearing.  It's perfectly legal for the memory to transition from 0x10 to 0x00 to 0x01.  
Agreed. I think `unordered` is a silly example, but I'd like to make sure that if an enthusiastic person tries to optimize this code further then they know that they can't generalize the optimization as-is. I was suggesting that you keep the atomic ordering just because we don't really need to remove it, but your point about tearing leads me the change my mind (bad code is bad, no need to try too hard).

I'd be OK if you added a negative test as I illustrated (`release` store followed by `unordered` store) along with an explanation of which outcome is valid: you have to keep the `release` ordering, but which value gets stored is undefined since the `unordered` store can move above the `release` one.

================
Comment at: test/Transforms/EarlyCSE/atomics.ll:170
@@ +169,3 @@
+; CHECK-NEXT: store atomic
+; CHECK-DAG: release
+; CHECK-NEXT: ret
----------------
reames wrote:
> jfb wrote:
> > Why not just:
> > ```
> > ; CHECK-LABEL: @test27
> > ; CHECK-NEXT: store atomic i32 3, i32* %p1 release
> > ; CHECK-NEXT: ret
> > ```
> > ?
> Would also work.  I was trying to avoid testing irrelevant details so that if someone tweaked the test, it wouldn't spuriously fail.  e.g. I could increase the alignment or change the values.  
I'd like to make sure `3` is still the value stored by this.

http://reviews.llvm.org/D15352