[PATCH] D11434: Allow value forwarding past release fences in EarlyCSE

Fri Jul 24 00:04:21 PDT 2015

dvyukov added inline comments.

================
Comment at: test/Transforms/EarlyCSE/fence.ll:43
@@ +42,3 @@
+  fence acquire
+  %a2 = load i32, i32* %addr.i, align 4
+  %res = sub i32 %a, %a2
----------------
reames wrote:
> dvyukov wrote:
> > jfb wrote:
> > > You can't forward `%a` and eliminate `%a2`, but you could eliminate `%a`, and replace it by `%a2` if there are no intervening uses, no?
> > In this particular case we can do whatever we want with accesses to %addr.i. No other thread can access it without a race.
> @jfb - Yes, but that's not what the current optimization tries to do.  Nor does EarlyCSE have a mechanism to do that in general even for unordered loads.  
> 
> @dvyukov - Yes, but we need to actually prove that and my change to EarlyCSE is no where near that aggressive.  And frankly, I'm really not interested in exploiting the undefined behaviour part of this.  
> 
> Do you have suggestions for how to restructure the test in a way which still tests EarlyCSE but avoids disallowing the optimization you pointed out?  Would simply removing the first load from the check sequence be sufficient?
I can't answer your question. I don't understand what EarlyCSE does, change description talks only about release fences but this test contains an acquire fence. I am trying to figure out what this change does by reading the tests. And the tests says that we cannot do something that we actually can do...

================
Comment at: test/Transforms/EarlyCSE/fence.ll:57
@@ +56,3 @@
+  fence release
+  store i32 5, i32* %addr.i, align 4
+  ret void
----------------
reames wrote:
> dvyukov wrote:
> > Is it a full test? Or is it something that we don't handle yet?
> > I am asking because in this particular case we can do whatever we want with accesses to %addr. Since we store to %addr both before and after the fence, we can conclude that this fence is not used to synchronize accesses to %addr, so we can eliminate/combine/etc accesses to %addr.
> > 
> In general, I don't believe it is legal to eliminate the first store.  With all memory initialized to zero, consider these two threads:
> 
> T1:
> b = 2;
> a = 1;
> release fence
> b = 4;
> 
> T2:
> while (a == 0) {}
> if (b == 0) abort();
> 
> In this *particular case* I think you're reasoning is sound because we can tell the store to a doesn't exist. 
> 
> (Just to be clear, your concern here is an unnecessarily restricted optimization, not a correctness concern right?)
> 
> Would splitting this test into two - one with a store to 'a' which is a negative test and one as is which is marked as a missed optimization - satisfy you?
> 
I don't understand. Your example is a one large data race. Behavior is undefined.

http://reviews.llvm.org/D11434