[PATCH] D11434: Allow value forwarding past release fences in EarlyCSE
listmail at philipreames.com
Fri Jul 24 10:43:54 PDT 2015
reames added inline comments.
Comment at: test/Transforms/EarlyCSE/fence.ll:43
@@ +42,3 @@
+ fence acquire
+ %a2 = load i32, i32* %addr.i, align 4
+ %res = sub i32 %a, %a2
> dvyukov wrote:
> > reames wrote:
> > > dvyukov wrote:
> > > > jfb wrote:
> > > > > You can't forward `%a` and eliminate `%a2`, but you could eliminate `%a`, and replace it by `%a2` if there are no intervening uses, no?
> > > > In this particular case we can do whatever we want with accesses to %addr.i. No other thread can access it without a race.
> > > @jfb - Yes, but that's not what the current optimization tries to do. Nor does EarlyCSE have a mechanism to do that in general even for unordered loads.
> > >
> > > @dvyukov - Yes, but we need to actually prove that and my change to EarlyCSE is no where near that aggressive. And frankly, I'm really not interested in exploiting the undefined behaviour part of this.
> > >
> > > Do you have suggestions for how to restructure the test in a way which still tests EarlyCSE but avoids disallowing the optimization you pointed out? Would simply removing the first load from the check sequence be sufficient?
> > I can't answer your question. I don't understand what EarlyCSE does, change description talks only about release fences but this test contains an acquire fence. I am trying to figure out what this change does by reading the tests. And the tests says that we cannot do something that we actually can do...
> I think the main problem Dmitry and I have is the comment: it says the transformation can't happen but it in fact is allowed to happen and simply ins't implemented here. You're testing the implementation knowing what it does, but your comment hints to future contributors that doing this optimization isn't valid.
> I'm assuming that the test can't be restructured because it would require a loop, and that's something that the current implementation won't see through. You're testing a very minimal thing instead (an overly conservative on, since it's a race!), and ensuring that the current code can't optimize it.
Guys, would simply adding a comment to the test explaining the alternate path by which this could be optimized satisfy you? The code in earlycse needs to be specific to release fences. I don't know of an easy way to exercise the code to form a negative test without something of similar form.
p.s. I get that there is some optimization which can recognize this as a data race. However, simple load-load forwarding as implemented in EarlyCSE is not it. Nor is likely to ever be so.
More information about the llvm-commits