RFC: Atomicity guarantees for __sync_* builtins

Fri Apr 10 08:19:44 PDT 2015

Hi,

The __sync_* builtins are currently implemented using an atomicrmw
instruction with the seq_cst ordering. A bug has been raised on GCC about
this, and I think it applies to LLVM too. However my memory-ordering-fu is
incredibly weak, so please bear with me if I make mistakes explaining.

Consider the following code:

void thread1(void)
{
	__sync_fetch_and_add(&foo, 1);
	printf("bar = %d\n", bar);
}

void thread2(void)
{
	__sync_fetch_and_add(&bar, 1);
	printf("foo = %d\n", foo);
}

The user expected that the output "bar = 0\nfoo = 0" was impossible. Note
that this is in C90/C99 mode - in C11 there is a race condition here so it
is the user's problem.

The problem is that a sequentially-consistent fetch and add can allow an
unordered load to jump in the middle. A seq_cst fetch and add could be
lowered as:

ld.acq.ex x0, [foo]
add x0, #1
st.rel.ex x0, [foo]

Now consider a following load from [bar]. That is unordered and may be
speculated before the store, because they are to different memory locations.

ld.acq.ex x0, [foo]
add x0, #1
ld.unordered x1, [bar] # Not the intention in using __sync_!
st.rel.ex x0, [foo]

So, I think the __sync_* builtins, at least in non-C11 mode, need a
stronger guarantee than seq_cst - perhaps an extra "fence" IR instruction?

What are experts' thoughts?

Cheers,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150410/4938bb4a/attachment.html>