RFC: Atomicity guarantees for __sync_* builtins
James Molloy
james at jamesmolloy.co.uk
Fri Apr 10 08:19:44 PDT 2015
Hi,
The __sync_* builtins are currently implemented using an atomicrmw
instruction with the seq_cst ordering. A bug has been raised on GCC about
this, and I think it applies to LLVM too. However my memory-ordering-fu is
incredibly weak, so please bear with me if I make mistakes explaining.
Consider the following code:
void thread1(void)
{
__sync_fetch_and_add(&foo, 1);
printf("bar = %d\n", bar);
}
void thread2(void)
{
__sync_fetch_and_add(&bar, 1);
printf("foo = %d\n", foo);
}
The user expected that the output "bar = 0\nfoo = 0" was impossible. Note
that this is in C90/C99 mode - in C11 there is a race condition here so it
is the user's problem.
The problem is that a sequentially-consistent fetch and add can allow an
unordered load to jump in the middle. A seq_cst fetch and add could be
lowered as:
ld.acq.ex x0, [foo]
add x0, #1
st.rel.ex x0, [foo]
Now consider a following load from [bar]. That is unordered and may be
speculated before the store, because they are to different memory locations.
ld.acq.ex x0, [foo]
add x0, #1
ld.unordered x1, [bar] # Not the intention in using __sync_!
st.rel.ex x0, [foo]
So, I think the __sync_* builtins, at least in non-C11 mode, need a
stronger guarantee than seq_cst - perhaps an extra "fence" IR instruction?
What are experts' thoughts?
Cheers,
James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150410/4938bb4a/attachment.html>
More information about the cfe-commits
mailing list