[PATCH] D27133: Introduce element-wise atomic memcpy and memmove intrinsics

Wed Dec 21 18:09:16 PST 2016

reames added a comment.

This looks like it's basically ready to go in, we're down to minor wordsmithing at this point.

I'd really prefer not to sign off on this myself.  Igor and I work together and I arguably have a conflict of interest here.  Can I get another reviewer to speak up and LGTM the patch?  If no one else has taken a look in a couple of days, I will LGTM to let us move forward, but I'd really prefer not to do that.

================
Comment at: docs/LangRef.rst:12667
+                                              i1 <isvolatile>)
+      declare void @llvm.memcpy.element.atomic.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
+                                              i64 <num_elements>, i32 <element_size>
----------------
sanjoy wrote:
> Why not just have the `i64` variant?
Yeah, I'd just drop the i32 version.  It's redundant.  I think this means that argument can just be an i64 rather than an anyint as well.

================
Comment at: docs/LangRef.rst:12710
+
+The order of the copy is unspecified. The same value may be read from the source
+buffer many times, but only one write is issued to the destination buffer per
----------------
sanjoy wrote:
> sanjoy wrote:
> > "The same value may be read from the source buffer many times" >> why do you care about this?
> "The order of the copy is unspecified" -- I'd be a bit more explicit -- "The order in which the  ``num_elements`` elements are copied is unspecified"
The ability to read multiple times let's you restart after inspecting the value.  Not sure why we'd want this, but let's leave the possibly available.  

(Maybe we can fastpath zero initialize if all the source values are zero or something?)

================
Comment at: docs/LangRef.rst:12716
+This intrinsic does not provide ordering with respect to other memory accesses
+in adjacent code. It can never be used to implement a fence-like construct.
+
----------------
sanjoy wrote:
> I'd expect for this to at least participate with reordering constraints due to other memory operations.  That is, I'd expect reordering
> 
> ```
> %val = acquire_load
> atomic_memcpy(...)
> ```
> 
> to
> 
> ```
> atomic_memcpy(...)
> %val = acquire_load
> ```
> 
> to be illegal, even if the memory locations involved in memcpy is disjoint with the location in the load.
Hm, good point.  This clearly needs reworded.  Possibly:
This intrinsic does not provide any additional ordering guarantees over those provided by a set of unordered loads from the source location and stores to the destination.

https://reviews.llvm.org/D27133