[PATCH] D52807: [COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic

Fri Oct 5 12:52:58 PDT 2018

efriedma added inline comments.

================
Comment at: lib/CodeGen/CGBuiltin.cpp:3003
+  case Builtin::BI_InterlockedCompareExchangePointer:
+  case Builtin::BI_InterlockedCompareExchangePointer_nf: {
     llvm::Type *RTy;
----------------
efriedma wrote:
> efriedma wrote:
> > rnk wrote:
> > > mgrang wrote:
> > > > rnk wrote:
> > > > > Is the no fence version really equivalent to this seq_cst version here?
> > > > I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the fence version). So maybe that needs to be fixed (and possibly other fence functions).
> > > > 
> > > > InterlockedCompareExchangePointer_nf should not create a fence so the current implementation seems correct.
> > > Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the atomicrmw orderings were enough, but that shows what I know about the C++ memory model. =p
> > > 
> > > We don't generate a fence for this intrinsic when MSVC does:
> > > ```
> > > unsigned char test_arm(long *base, long idx) {
> > >   return _interlockedbittestandreset_acq(base, idx);
> > > }
> > > ```
> > > 
> > > @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? If this is their "bug", should we be bug-for-bug compatible with it so we get the same observable memory states?
> > "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences... but presumably the "nf" version should use ldxr/stxr instead.  At the IR level, that corresponds to "monotonic".
> In terms of emitting fences, maybe we are actually missing a fence.
> 
> An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk.  That said, a stronger fence isn't necessary to model C++11 atomics; C++11 only requires sequential consistency with other sequentially consistent operations, plus an acquire/release barrier.  And a program that isn't using C++11 relaxed atomics can't depend on a stronger barrier without triggering a data race.
> 
> A program written before C++11 atomics were generally available could theoretically depend on the barrier through a series of operations that cause a data race under the C++11 rules.  Since there were no clear rules before C++11, programs did things like emulate relaxed atomics on top of volatile pointers; using such a construct, a program could depend on the implied barrier.  For this reason, gcc emits a fence on AArch64 after `__sync_*` (but not `__atomic_*`).  And I guess MSVC does something similar for `_Interlocked*`.
> 
> That said, LLVM has never emitted a dmb for `__sync_*` operations on AArch64, and it hasn't led to any practical problems as far as I know.
> it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk.

Looking over it again, this bit is a really bad description of what actually happens. Really there are two issues: one, an operation from after an ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg doesn't always execute the stlxr.

Repository:
  rC Clang

https://reviews.llvm.org/D52807