[PATCH] D27133: Introduce element-wise atomic memcpy and memmove intrinsics

Thu Dec 1 12:26:20 PST 2016

reames added inline comments.

================
Comment at: docs/LangRef.rst:12684
+specifying the number of elements to copy, the fourth argument is size of
+the single element in bytes, the fifths argument is the alignment of the source
+and destination locations, and the sixths is a boolean indicating a volatile
----------------
igor-laevsky wrote:
> reames wrote:
> > It might be good to specify the alignment of source and destination separately.
> Why do you think this will be useful? This will definitely add some complexity to the possible transforms with this intrinsics.
I don't have a strong view here.  I just vaguely remembering someone wanting to do that to the memcpy intrinsic.

================
Comment at: docs/LangRef.rst:12704
+the destination location. These locations are not allowed to overlap. Memory copy
+is performed as a sequence of memory accesses where each access is guaranteed to 
+be at least ``element_size`` wide unordered atomic.
----------------
igor-laevsky wrote:
> reames wrote:
> > Hm, the wording isn't quite right.  As written, we could have six bytes to copy with an element size of 2 and be legally allowed to copy two three byte chunks.
> > 
> > Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size and aligned at an element size boundary.  (e.g. each element is accessed atomicly in source and destination buffer)
> > 
> > Also, add: 
> > The order of the copy is unspecified.  The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per element.  It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are at least unordered atomic.  
> > 
> > This intrinsic does not provide ordering with respect to other memory accesses in adjacent code.  It can be used to implement a fence-like construct.
> Right. Thanks for catching this!
> 
> > Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size
> 
> Not sure, why it should be even multiplier? I think it's fine to copy any number of elements in a single operation, only importance is to never copy them partially. 
> 
> > This intrinsic does not provide ordering with respect to other memory accesses in adjacent code. It can be used to implement a fence-like construct.
> 
> Since there is no ordering, why it can be used as a fence-like construct?
w.r.t. "even multiple" you're right.  The even can be dropped.  The key word is "multiple".

w.r.t. the fence instruction, that's spell check dropping a word.  It should have been "It can NOT be used..."

================
Comment at: docs/LangRef.rst:12711
+Call to the '``llvm.memcpy.element.atomic.*``' is lowered to call to the symbol
+``__llvm_memcpy_element_atomic``. Only first four arguments are passed to this
+call.
----------------
igor-laevsky wrote:
> reames wrote:
> > If we're just going to ignore the volatile argument, should we have it?
> > 
> > Also mention that the optimizer may inline the copy if profitable.
> Plan is to use volatile to disable all optimizations involving this intrinsic. Much in a same way as original memcpy does.
Ok.

================
Comment at: include/llvm/IR/Intrinsics.td:766
+def int_memcpy_element_atomic  : Intrinsic<[],
+                                           [llvm_anyptr_ty, llvm_anyptr_ty,
+                                            llvm_anyint_ty, llvm_i32_ty,
----------------
igor-laevsky wrote:
> reames wrote:
> > The first argument is the return type right?  Shouldn't that be void?
> I'm not sure if there is any semantic difference between specifying empty list as a return value or specifying list with a single element which is llvm_void_ty. However I see that all other intrinsic definitions describe void return type as an empty list, so anyway it's better to be in sync with the rest of the code.
I think I just misread the code.  I thought you had three llvm_anyptr_ty arguments.  I now see the third is an llvm_anyint_ty.

https://reviews.llvm.org/D27133