[PATCH] D11382: x86 atomic: optimize a.store(reg op a.load(acquire), release)

Wed Jul 22 11:23:54 PDT 2015

jfb added a comment.

In http://reviews.llvm.org/D11382#209576, @dvyukov wrote:

> In http://reviews.llvm.org/D11382#209066, @jfb wrote:
>
> > In http://reviews.llvm.org/D11382#208803, @dvyukov wrote:
> >
> > > Will this optimization transform:
> > >
> > >   int foo() {
> > >      int r = atomic_load_n(&x, __ATOMIC_RELAXED);
> > >      atomic_store_n(&x, r+1, __ATOMIC_RELAXED);
> > >      return r;
> > >   }
> > >   
> > >
> > > ? If yes, how?
> >
> >
> > Good point, I added test `add_32r_self` to ensure that this doesn't happen, and that the pattern matching figures out dependencies properly.
>
>
> I am glad that the comment was useful, but I actually asked a different thing :)
>  My example does not contain self-add. It contains two usages of a load result, and one of the usages can be potentially folded. My concern was that the code can be compiled as:
>
>   MOV [addr], r
>   ADD [addr], 1
>   MOV r, rax
>   RET
>   
>
> or to:
>
>   ADD [addr], 1
>   MOV [addr], rax
>   RET
>   
>
> Both of which would be incorrect transformations -- two loads instead of one.
>  I guess this transformation should require that the folded store is the only usage of the load result.

Oh sorry, I totally misunderstood you! I added a test for this. IIUC it can't happen because the entire pattern that's matched is replace with a pseudo instruction, so an escaping intermediate result wouldn't have a def anymore.

http://reviews.llvm.org/D11382