[PATCH] [X86] replace (atomic fetch_add of 0) by (mfence; mov)

Wed Aug 27 15:01:10 PDT 2014

Hi jfb,

Mostly useful for implementing seqlocks in C11/C++11, as explained in
http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf
In particular, it can avoid cache-line bouncing, bringing massive scalability
improvements in the micro-benchmarks of the paper.

This cannot be done as a target-independent pass, because it is unsound
to turn a fetch_add(&x, 0, release) into fence(seq_cst); load(&x, seq_cst)
as shown by the following example(from the paper above):
atomic<int> x = y = 0;
Thread 0:
    x.store(1, mo_relaxed);
    r1 = y.fetch_add(0, mo_release);
Thread 1:
    y.fetch_add(1, mo_acquire);
    r2 = x.load(mo_relaxed);
r1 == r2 == 0 is not possible in the above code, but becomes possible if it the
fetch_add of thread 0 is turned into a fence followed by a load, even if they
are both seq_cst.

http://reviews.llvm.org/D5091

Files:
  lib/Target/X86/X86ISelDAGToDAG.cpp
  test/CodeGen/X86/atomic_add_zero.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5091.13005.patch
Type: text/x-patch
Size: 5378 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140827/c98ec5ed/attachment.bin>