[PATCH] D27133: Introduce element-wise atomic memcpy and memmove intrinsics

Thu Dec 1 05:11:51 PST 2016

igor-laevsky added inline comments.

================
Comment at: docs/LangRef.rst:12684
+specifying the number of elements to copy, the fourth argument is size of
+the single element in bytes, the fifths argument is the alignment of the source
+and destination locations, and the sixths is a boolean indicating a volatile
----------------
reames wrote:
> It might be good to specify the alignment of source and destination separately.
Why do you think this will be useful? This will definitely add some complexity to the possible transforms with this intrinsics.

================
Comment at: docs/LangRef.rst:12704
+the destination location. These locations are not allowed to overlap. Memory copy
+is performed as a sequence of memory accesses where each access is guaranteed to 
+be at least ``element_size`` wide unordered atomic.
----------------
reames wrote:
> Hm, the wording isn't quite right.  As written, we could have six bytes to copy with an element size of 2 and be legally allowed to copy two three byte chunks.
> 
> Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size and aligned at an element size boundary.  (e.g. each element is accessed atomicly in source and destination buffer)
> 
> Also, add: 
> The order of the copy is unspecified.  The same value may be read from the source buffer many times, but only one write is issued to the destination buffer per element.  It is well defined to have concurrent reads and writes to both source and destination provided those reads and writes are at least unordered atomic.  
> 
> This intrinsic does not provide ordering with respect to other memory accesses in adjacent code.  It can be used to implement a fence-like construct.
Right. Thanks for catching this!

> Memory copy is performed as a sequence of memory accesses where each access is an even multiple of element size

Not sure, why it should be even multiplier? I think it's fine to copy any number of elements in a single operation, only importance is to never copy them partially. 

> This intrinsic does not provide ordering with respect to other memory accesses in adjacent code. It can be used to implement a fence-like construct.

Since there is no ordering, why it can be used as a fence-like construct?

================
Comment at: docs/LangRef.rst:12711
+Call to the '``llvm.memcpy.element.atomic.*``' is lowered to call to the symbol
+``__llvm_memcpy_element_atomic``. Only first four arguments are passed to this
+call.
----------------
reames wrote:
> If we're just going to ignore the volatile argument, should we have it?
> 
> Also mention that the optimizer may inline the copy if profitable.
Plan is to use volatile to disable all optimizations involving this intrinsic. Much in a same way as original memcpy does.

================
Comment at: docs/LangRef.rst:12714
+
+'``llvm.memmove.element.atomic``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------
reames wrote:
> Not sure we really need memmove.  I might start with just atomic_memcpy and see where that gets us.
My thought was that since it looks very much like memcpy it would be easy to add them both together. However you're right and for the sake of simplifying review process I removed memmove from this change.

================
Comment at: include/llvm/IR/Intrinsics.td:766
+def int_memcpy_element_atomic  : Intrinsic<[],
+                                           [llvm_anyptr_ty, llvm_anyptr_ty,
+                                            llvm_anyint_ty, llvm_i32_ty,
----------------
reames wrote:
> The first argument is the return type right?  Shouldn't that be void?
I'm not sure if there is any semantic difference between specifying empty list as a return value or specifying list with a single element which is llvm_void_ty. However I see that all other intrinsic definitions describe void return type as an empty list, so anyway it's better to be in sync with the rest of the code.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4913
+    Entry.Node = Dst;
+    Args.push_back(Entry);
+
----------------
reames wrote:
> If this can be written as:
> Args.push_back(Entry(Ty, Node))
> 
> please do.  the stateful code is slightly confusing. 
Unfortunately TargetLowering::ArgListEntry doesn't have suitable constructor. Probably it worth adding it, but in a different change.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:4932
+        return RTLIB::MEMMOVE_ELEMENT_ATOMIC;
+      default: llvm_unreachable("shouldn't happen");
+      }
----------------
anna wrote:
> do we need a switch case here? the main switch case on the intrinsic is only triggered for memcpy and memmove (line 4901). Maybe we can have it as a ternary with just `memcpy_element_atomic` and `memmove_element_atomic`?
Right, we don't need it. I completely removed memmove from the change, so now there is no conditional at all.

================
Comment at: lib/IR/Verifier.cpp:3963
+           CS);
+    break;
+  }
----------------
anna wrote:
> Don't we need verification that the source and dest are of same addrspace?
I didn't add this limitation to the intrinsic definition. Since address space semantics is target defined, we might be able to execute memory copy from one address space to another. I think it should be frontend's job to ensure that this is a valid operation.

https://reviews.llvm.org/D27133