[PATCH] D11382: x86 atomic: optimize a.store(reg op a.load(acquire), release)

Wed Jul 22 01:57:42 PDT 2015

dvyukov added a comment.

In http://reviews.llvm.org/D11382#209066, @jfb wrote:

> In http://reviews.llvm.org/D11382#208803, @dvyukov wrote:
>
> > Will this optimization transform:
> >
> >   int foo() {
> >      int r = atomic_load_n(&x, __ATOMIC_RELAXED);
> >      atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
> >      return r;
> >   }
> >   
> >
> > ? If yes, how?
>
>
> Good point, I added test `add_32r_self` to ensure that this doesn't happen, and that the pattern matching figures out dependencies properly.

I am glad that the comment was useful, but I actually asked a different thing :)
My example does not contain self-add. It contains two usages of a load result, and one of the usages can be potentially folded. My concern was that the code can be compiled as:

  MOV [addr], r
  ADD [addr], 1
  MOV r, rax
  RET

or to:

  ADD [addr], 1
  MOV [addr], rax
  RET

Both of which would be incorrect transformations -- two loads instead of one.
I guess this transformation should require that the folded store is the only usage of the load result.

http://reviews.llvm.org/D11382