[PATCH] [X86] replace (atomic fetch_add of 0) by (mfence; mov)

Sat Aug 30 10:29:35 PDT 2014

The code and tests don't really discuss the atomic ordering. It would be good to provide an intuition for why your code works even with different orderings.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1749
@@ +1748,3 @@
+// On x86 an atomic load-add of the constant 0 can be replaced by an mfence
+// followed by a mov. A detailed explanation of this (and exemple of why the
+// mfence is required) is available at
----------------
s/exemple/example/

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1756
@@ +1755,3 @@
+// being sunk out of the critical section. Replacing the last load by a
+// fetch_add(0, release) accomplishes just that.. but requires this
+// optimization to preserve the desirable property of seqlocks that readers
----------------
s/../,/

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1758
@@ +1757,3 @@
+// optimization to preserve the desirable property of seqlocks that readers
+// do not cause cache line bouncing.
+// The mfence is required because otherwise the load could be hoisted before
----------------
I think it might be clearer to add that the cache line goes from shared to modified state with the fetch_add, whereas it remains shared with the load.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:1762
@@ +1761,3 @@
+// fetch_add could not do (since only store-load can be reordered in
+// load-store) on X86.
+SDNode *X86DAGToDAGISel::SelectAtomicAddZero(SDNode *Node, MVT NVT) {
----------------
I think it would be good to explain that the mfence is required because the *hardware* can reorder load/store. This code is in a compiler, the reader may assume that "hoisting" is done by the compiler.

================
Comment at: lib/Target/X86/X86ISelDAGToDAG.cpp:2191
@@ -2111,5 +2190,3 @@
   case ISD::ATOMIC_LOAD_AND:
-  case ISD::ATOMIC_LOAD_OR:
-  case ISD::ATOMIC_LOAD_ADD: {
-    SDNode *RetVal = SelectAtomicLoadArith(Node, NVT);
-    if (RetVal)
+  case ISD::ATOMIC_LOAD_OR: {
+    if (SDNode *RetVal = SelectAtomicLoadArith(Node, NVT))
----------------
Would it make sense to also select atomic or zero in the same way? I can't think of a reason for someone to use atomic or zero instead of add zero, but they are strictly equivalent so it seems silly to not select both with the same code.

http://reviews.llvm.org/D5091