[PATCH] D52807: [COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic
Eli Friedman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Oct 4 15:28:01 PDT 2018
efriedma added inline comments.
================
Comment at: lib/CodeGen/CGBuiltin.cpp:3003
+ case Builtin::BI_InterlockedCompareExchangePointer:
+ case Builtin::BI_InterlockedCompareExchangePointer_nf: {
llvm::Type *RTy;
----------------
efriedma wrote:
> rnk wrote:
> > mgrang wrote:
> > > rnk wrote:
> > > > Is the no fence version really equivalent to this seq_cst version here?
> > > I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the fence version). So maybe that needs to be fixed (and possibly other fence functions).
> > >
> > > InterlockedCompareExchangePointer_nf should not create a fence so the current implementation seems correct.
> > Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the atomicrmw orderings were enough, but that shows what I know about the C++ memory model. =p
> >
> > We don't generate a fence for this intrinsic when MSVC does:
> > ```
> > unsigned char test_arm(long *base, long idx) {
> > return _interlockedbittestandreset_acq(base, idx);
> > }
> > ```
> >
> > @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? If this is their "bug", should we be bug-for-bug compatible with it so we get the same observable memory states?
> "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences... but presumably the "nf" version should use ldxr/stxr instead. At the IR level, that corresponds to "monotonic".
In terms of emitting fences, maybe we are actually missing a fence.
An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. That said, a stronger fence isn't necessary to model C++11 atomics; C++11 only requires sequential consistency with other sequentially consistent operations, plus an acquire/release barrier. And a program that isn't using C++11 relaxed atomics can't depend on a stronger barrier without triggering a data race.
A program written before C++11 atomics were generally available could theoretically depend on the barrier through a series of operations that cause a data race under the C++11 rules. Since there were no clear rules before C++11, programs did things like emulate relaxed atomics on top of volatile pointers; using such a construct, a program could depend on the implied barrier. For this reason, gcc emits a fence on AArch64 after `__sync_*` (but not `__atomic_*`). And I guess MSVC does something similar for `_Interlocked*`.
That said, LLVM has never emitted a dmb for `__sync_*` operations on AArch64, and it hasn't led to any practical problems as far as I know.
Repository:
rC Clang
https://reviews.llvm.org/D52807
More information about the cfe-commits
mailing list