[PATCH] D27133: Introduce element-wise atomic memcpy and memmove intrinsics

Wed Nov 30 18:48:32 PST 2016

reames added inline comments.

================
Comment at: docs/LangRef.rst:12684
+specifying the number of elements to copy, the fourth argument is size of
+the single element in bytes, the fifths argument is the alignment of the source
+and destination locations, and the sixths is a boolean indicating a volatile
----------------
It might be good to specify the alignment of source and destination separately.

================
Comment at: docs/LangRef.rst:12704
+the destination location. These locations are not allowed to overlap. Memory copy
+is performed as a sequence of memory accesses where each access is guaranteed to 
+be at least ``element_size`` wide unordered atomic.
----------------
Hm, the wording isn't quite right.  As written, we could have six bytes to copy with an element size of 2 and be legally allowed to copy two three byte chunks.

Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size and aligned at an element size boundary.  (e.g. each element is accessed atomicly in source and destination buffer)

Also, add: 
The order of the copy is unspecified.  The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per element.  It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are at least unordered atomic.  

This intrinsic does not provide ordering with respect to other memory accesses in adjacent code.  It can be used to implement a fence-like construct.

================
Comment at: docs/LangRef.rst:12711
+Call to the '``llvm.memcpy.element.atomic.*``' is lowered to call to the symbol
+``__llvm_memcpy_element_atomic``. Only first four arguments are passed to this
+call.
----------------
If we're just going to ignore the volatile argument, should we have it?

Also mention that the optimizer may inline the copy if profitable.

================
Comment at: docs/LangRef.rst:12714
+
+'``llvm.memmove.element.atomic``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------
Not sure we really need memmove.  I might start with just atomic_memcpy and see where that gets us.

================
Comment at: include/llvm/IR/Intrinsics.td:766
+def int_memcpy_element_atomic  : Intrinsic<[],
+                                           [llvm_anyptr_ty, llvm_anyptr_ty,
+                                            llvm_anyint_ty, llvm_i32_ty,
----------------
The first argument is the return type right?  Shouldn't that be void?

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4906
+    SDValue NumElements = getValue(I.getArgOperand(2));
+    SDValue ElementSize = getValue(I.getArgOperand(3));
+
----------------
We'll want to inline small copies here eventually, but this is a fine starting place.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4913
+    Entry.Node = Dst;
+    Args.push_back(Entry);
+
----------------
If this can be written as:
Args.push_back(Entry(Ty, Node))

please do.  the stateful code is slightly confusing. 

================
Comment at: test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll:6
+  ret i8* %P
+  ; CHECK: __llvm_memcpy_element_atomic
+}
----------------
Check the argument passing as well.

https://reviews.llvm.org/D27133