[llvm] r305558 - [Atomics] Rename and change prototype for atomic memcpy intrinsic

Fri Jun 16 07:44:00 PDT 2017

Author: dneilson
Date: Fri Jun 16 09:43:59 2017
New Revision: 305558

URL: http://llvm.org/viewvc/llvm-project?rev=305558&view=rev
Log:
[Atomics] Rename and change prototype for atomic memcpy intrinsic

Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

Modified:
    llvm/trunk/docs/LangRef.rst
    llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h
    llvm/trunk/include/llvm/IR/IRBuilder.h
    llvm/trunk/include/llvm/IR/IntrinsicInst.h
    llvm/trunk/include/llvm/IR/Intrinsics.td
    llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
    llvm/trunk/lib/IR/IRBuilder.cpp
    llvm/trunk/lib/IR/Verifier.cpp
    llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
    llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h
    llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
    llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll
    llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll
    llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll
    llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll
    llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll

Modified: llvm/trunk/docs/LangRef.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LangRef.rst?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================

--- llvm/trunk/docs/LangRef.rst (original)
+++ llvm/trunk/docs/LangRef.rst Fri Jun 16 09:43:59 2017
@@ -14068,62 +14068,66 @@ Element Wise Atomic Memory Intrinsics
 These intrinsics are similar to the standard library memory intrinsics except
 that they perform memory transfer as a sequence of atomic memory accesses.
 
-.. _int_memcpy_element_atomic:
+.. _int_memcpy_element_unordered_atomic:
 
-'``llvm.memcpy.element.atomic``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.memcpy.element.unordered.atomic``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Syntax:
 """""""
 
-This is an overloaded intrinsic. You can use ``llvm.memcpy.element.atomic`` on
+This is an overloaded intrinsic. You can use ``llvm.memcpy.element.unordered.atomic`` on
 any integer bit width and for different address spaces. Not all targets
 support all bit widths however.
 
 ::
 
-      declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* <dest>, i8* <src>,
-                                              i64 <num_elements>, i32 <element_size>)
+      declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* <dest>,
+                                                                       i8* <src>,
+                                                                       i32 <len>,
+                                                                       i32 <element_size>)
+      declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* <dest>,
+                                                                       i8* <src>,
+                                                                       i64 <len>,
+                                                                       i32 <element_size>)
 
 Overview:
 """""""""
 
-The '``llvm.memcpy.element.atomic.*``' intrinsic performs copy of a block of 
-memory from the source location to the destination location as a sequence of
-unordered atomic memory accesses where each access is a multiple of
-``element_size`` bytes wide and aligned at an element size boundary. For example
-each element is accessed atomically in source and destination buffers.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic is a specialization of the
+'``llvm.memcpy.*``' intrinsic. It differs in that the ``dest`` and ``src`` are treated
+as arrays with elements that are exactly ``element_size`` bytes, and the copy between
+buffers uses a sequence of :ref:`unordered atomic <ordering>` load/store operations
+that are a positive integer multiple of the ``element_size`` in size.
 
 Arguments:
 """"""""""
 
-The first argument is a pointer to the destination, the second is a
-pointer to the source. The third argument is an integer argument
-specifying the number of elements to copy, the fourth argument is size of
-the single element in bytes.
-
-``element_size`` should be a power of two, greater than zero and less than
-a target-specific atomic access size limit.
-
-For each of the input pointers ``align`` parameter attribute must be specified.
-It must be a power of two and greater than or equal to the ``element_size``.
-Caller guarantees that both the source and destination pointers are aligned to
-that boundary.
+The first three arguments are the same as they are in the :ref:`@llvm.memcpy <int_memcpy>`
+intrinsic, with the added constraint that ``len`` is required to be a positive integer
+multiple of the ``element_size``. If ``len`` is not a positive integer multiple of
+``element_size``, then the behaviour of the intrinsic is undefined.
+
+``element_size`` must be a compile-time constant positive power of two no greater than
+target-specific atomic access size limit.
+
+For each of the input pointers ``align`` parameter attribute must be specified. It
+must be a power of two no less than the ``element_size``. Caller guarantees that
+both the source and destination pointers are aligned to that boundary.
 
 Semantics:
 """"""""""
 
-The '``llvm.memcpy.element.atomic.*``' intrinsic copies
-'``num_elements`` * ``element_size``' bytes of memory from the source location to
-the destination location. These locations are not allowed to overlap. Memory copy
-is performed as a sequence of unordered atomic memory accesses where each access
-is guaranteed to be a multiple of ``element_size`` bytes wide and aligned at an
-element size boundary.
+The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
+memory from the source location to the destination location. These locations are not
+allowed to overlap. The memory copy is performed as a sequence of load/store operations
+where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
+aligned at an ``element_size`` boundary. 
 
 The order of the copy is unspecified. The same value may be read from the source
 buffer many times, but only one write is issued to the destination buffer per
-element. It is well defined to have concurrent reads and writes to both source
-and destination provided those reads and writes are at least unordered atomic.
+element. It is well defined to have concurrent reads and writes to both source and
+destination provided those reads and writes are unordered atomic when specified.
 
 This intrinsic does not provide any additional ordering guarantees over those
 provided by a set of unordered loads from the source location and stores to the
@@ -14132,8 +14136,8 @@ destination.
 Lowering:
 """""""""
 
-In the most general case call to the '``llvm.memcpy.element.atomic.*``' is lowered
-to a call to the symbol ``__llvm_memcpy_element_atomic_*``. Where '*' is replaced
-with an actual element size.
+In the most general case call to the '``llvm.memcpy.element.unordered.atomic.*``' is
+lowered to a call to the symbol ``__llvm_memcpy_element_unordered_atomic_*``. Where '*'
+is replaced with an actual element size.
 
-Optimizer is allowed to inline memory copy when it's profitable to do so.
+The optimizer is allowed to inline the memory copy when it's profitable to do so.

Modified: llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h (original)
+++ llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h Fri Jun 16 09:43:59 2017
@@ -333,12 +333,12 @@ namespace RTLIB {
     MEMSET,
     MEMMOVE,
 
-    // ELEMENT-WISE ATOMIC MEMORY
-    MEMCPY_ELEMENT_ATOMIC_1,
-    MEMCPY_ELEMENT_ATOMIC_2,
-    MEMCPY_ELEMENT_ATOMIC_4,
-    MEMCPY_ELEMENT_ATOMIC_8,
-    MEMCPY_ELEMENT_ATOMIC_16,
+    // ELEMENT-WISE UNORDERED-ATOMIC MEMORY of different element sizes
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_1,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_2,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_4,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_8,
+    MEMCPY_ELEMENT_UNORDERED_ATOMIC_16,
 
     // EXCEPTION HANDLING
     UNWIND_RESUME,
@@ -511,9 +511,10 @@ namespace RTLIB {
   /// UNKNOWN_LIBCALL if there is none.
   Libcall getSYNC(unsigned Opc, MVT VT);
 
-  /// getMEMCPY_ELEMENT_ATOMIC - Return MEMCPY_ELEMENT_ATOMIC_* value for the
-  /// given element size or UNKNOW_LIBCALL if there is none.
-  Libcall getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize);
+  /// getMEMCPY_ELEMENT_UNORDERED_ATOMIC - Return
+  /// MEMCPY_ELEMENT_UNORDERED_ATOMIC_* value for the given element size or
+  /// UNKNOW_LIBCALL if there is none.
+  Libcall getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize);
 }
 }
 

Modified: llvm/trunk/include/llvm/IR/IRBuilder.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/IRBuilder.h?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/include/llvm/IR/IRBuilder.h (original)
+++ llvm/trunk/include/llvm/IR/IRBuilder.h Fri Jun 16 09:43:59 2017
@@ -435,27 +435,25 @@ public:
                          MDNode *ScopeTag = nullptr,
                          MDNode *NoAliasTag = nullptr);
 
-  /// \brief Create and insert an atomic memcpy between the specified
-  /// pointers.
+  /// \brief Create and insert an element unordered-atomic memcpy between the
+  /// specified pointers.
   ///
   /// If the pointers aren't i8*, they will be converted.  If a TBAA tag is
   /// specified, it will be added to the instruction. Likewise with alias.scope
   /// and noalias tags.
-  CallInst *CreateElementAtomicMemCpy(
-      Value *Dst, Value *Src, uint64_t NumElements, uint32_t ElementSize,
+  CallInst *CreateElementUnorderedAtomicMemCpy(
+      Value *Dst, Value *Src, uint64_t Size, uint32_t ElementSize,
       MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
       MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr) {
-    return CreateElementAtomicMemCpy(Dst, Src, getInt64(NumElements),
-                                     ElementSize, TBAATag, TBAAStructTag,
-                                     ScopeTag, NoAliasTag);
+    return CreateElementUnorderedAtomicMemCpy(
+        Dst, Src, getInt64(Size), ElementSize, TBAATag, TBAAStructTag, ScopeTag,
+        NoAliasTag);
   }
 
-  CallInst *CreateElementAtomicMemCpy(Value *Dst, Value *Src,
-                                      Value *NumElements, uint32_t ElementSize,
-                                      MDNode *TBAATag = nullptr,
-                                      MDNode *TBAAStructTag = nullptr,
-                                      MDNode *ScopeTag = nullptr,
-                                      MDNode *NoAliasTag = nullptr);
+  CallInst *CreateElementUnorderedAtomicMemCpy(
+      Value *Dst, Value *Src, Value *Size, uint32_t ElementSize,
+      MDNode *TBAATag = nullptr, MDNode *TBAAStructTag = nullptr,
+      MDNode *ScopeTag = nullptr, MDNode *NoAliasTag = nullptr);
 
   /// \brief Create and insert a memmove between the specified
   /// pointers.

Modified: llvm/trunk/include/llvm/IR/IntrinsicInst.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/IntrinsicInst.h?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/include/llvm/IR/IntrinsicInst.h (original)
+++ llvm/trunk/include/llvm/IR/IntrinsicInst.h Fri Jun 16 09:43:59 2017
@@ -205,25 +205,91 @@ namespace llvm {
   };
 
   /// This class represents atomic memcpy intrinsic
-  /// TODO: Integrate this class into MemIntrinsic hierarchy.
-  class ElementAtomicMemCpyInst : public IntrinsicInst {
+  /// TODO: Integrate this class into MemIntrinsic hierarchy; for now this is
+  /// C&P of all methods from that hierarchy
+  class ElementUnorderedAtomicMemCpyInst : public IntrinsicInst {
+  private:
+    enum { ARG_DEST = 0, ARG_SOURCE = 1, ARG_LENGTH = 2, ARG_ELEMENTSIZE = 3 };
+
   public:
-    Value *getRawDest() const { return getArgOperand(0); }
-    Value *getRawSource() const { return getArgOperand(1); }
+    Value *getRawDest() const {
+      return const_cast<Value *>(getArgOperand(ARG_DEST));
+    }
+    const Use &getRawDestUse() const { return getArgOperandUse(ARG_DEST); }
+    Use &getRawDestUse() { return getArgOperandUse(ARG_DEST); }
+
+    /// Return the arguments to the instruction.
+    Value *getRawSource() const {
+      return const_cast<Value *>(getArgOperand(ARG_SOURCE));
+    }
+    const Use &getRawSourceUse() const { return getArgOperandUse(ARG_SOURCE); }
+    Use &getRawSourceUse() { return getArgOperandUse(ARG_SOURCE); }
+
+    Value *getLength() const {
+      return const_cast<Value *>(getArgOperand(ARG_LENGTH));
+    }
+    const Use &getLengthUse() const { return getArgOperandUse(ARG_LENGTH); }
+    Use &getLengthUse() { return getArgOperandUse(ARG_LENGTH); }
 
-    Value *getNumElements() const { return getArgOperand(2); }
-    void setNumElements(Value *V) { setArgOperand(2, V); }
+    bool isVolatile() const { return false; }
+
+    Value *getRawElementSizeInBytes() const {
+      return const_cast<Value *>(getArgOperand(ARG_ELEMENTSIZE));
+    }
+
+    ConstantInt *getElementSizeInBytesCst() const {
+      return cast<ConstantInt>(getRawElementSizeInBytes());
+    }
+
+    uint32_t getElementSizeInBytes() const {
+      return getElementSizeInBytesCst()->getZExtValue();
+    }
 
-    uint64_t getSrcAlignment() const { return getParamAlignment(0); }
-    uint64_t getDstAlignment() const { return getParamAlignment(1); }
+    /// This is just like getRawDest, but it strips off any cast
+    /// instructions that feed it, giving the original input.  The returned
+    /// value is guaranteed to be a pointer.
+    Value *getDest() const { return getRawDest()->stripPointerCasts(); }
+
+    /// This is just like getRawSource, but it strips off any cast
+    /// instructions that feed it, giving the original input.  The returned
+    /// value is guaranteed to be a pointer.
+    Value *getSource() const { return getRawSource()->stripPointerCasts(); }
+
+    unsigned getDestAddressSpace() const {
+      return cast<PointerType>(getRawDest()->getType())->getAddressSpace();
+    }
+
+    unsigned getSourceAddressSpace() const {
+      return cast<PointerType>(getRawSource()->getType())->getAddressSpace();
+    }
+
+    /// Set the specified arguments of the instruction.
+    void setDest(Value *Ptr) {
+      assert(getRawDest()->getType() == Ptr->getType() &&
+             "setDest called with pointer of wrong type!");
+      setArgOperand(ARG_DEST, Ptr);
+    }
+
+    void setSource(Value *Ptr) {
+      assert(getRawSource()->getType() == Ptr->getType() &&
+             "setSource called with pointer of wrong type!");
+      setArgOperand(ARG_SOURCE, Ptr);
+    }
+
+    void setLength(Value *L) {
+      assert(getLength()->getType() == L->getType() &&
+             "setLength called with value of wrong type!");
+      setArgOperand(ARG_LENGTH, L);
+    }
 
-    uint64_t getElementSizeInBytes() const {
-      Value *Arg = getArgOperand(3);
-      return cast<ConstantInt>(Arg)->getZExtValue();
+    void setElementSizeInBytes(Constant *V) {
+      assert(V->getType() == Type::getInt8Ty(getContext()) &&
+             "setElementSizeInBytes called with value of wrong type!");
+      setArgOperand(ARG_ELEMENTSIZE, V);
     }
 
     static inline bool classof(const IntrinsicInst *I) {
-      return I->getIntrinsicID() == Intrinsic::memcpy_element_atomic;
+      return I->getIntrinsicID() == Intrinsic::memcpy_element_unordered_atomic;
     }
     static inline bool classof(const Value *V) {
       return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));

Modified: llvm/trunk/include/llvm/IR/Intrinsics.td
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Intrinsics.td?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/include/llvm/IR/Intrinsics.td (original)
+++ llvm/trunk/include/llvm/IR/Intrinsics.td Fri Jun 16 09:43:59 2017
@@ -862,11 +862,16 @@ def int_xray_customevent : Intrinsic<[],
 //===------ Memory intrinsics with element-wise atomicity guarantees ------===//
 //
 
-def int_memcpy_element_atomic  : Intrinsic<[],
-                                           [llvm_anyptr_ty, llvm_anyptr_ty,
-                                            llvm_i64_ty, llvm_i32_ty],
-                                 [IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
-                                  WriteOnly<0>, ReadOnly<1>]>;
+// @llvm.memcpy.element.unordered.atomic.*(dest, src, length, elementsize)
+def int_memcpy_element_unordered_atomic
+    : Intrinsic<[],
+                [
+                  llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i32_ty
+                ],
+                [
+                  IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>,
+                  ReadOnly<1>
+                ]>;
 
 //===------------------------ Reduction Intrinsics ------------------------===//
 //

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp Fri Jun 16 09:43:59 2017
@@ -4943,11 +4943,12 @@ SelectionDAGBuilder::visitIntrinsicCall(
     updateDAGForMaybeTailCall(MM);
     return nullptr;
   }
-  case Intrinsic::memcpy_element_atomic: {
-    SDValue Dst = getValue(I.getArgOperand(0));
-    SDValue Src = getValue(I.getArgOperand(1));
-    SDValue NumElements = getValue(I.getArgOperand(2));
-    SDValue ElementSize = getValue(I.getArgOperand(3));
+  case Intrinsic::memcpy_element_unordered_atomic: {
+    const ElementUnorderedAtomicMemCpyInst &MI =
+        cast<ElementUnorderedAtomicMemCpyInst>(I);
+    SDValue Dst = getValue(MI.getRawDest());
+    SDValue Src = getValue(MI.getRawSource());
+    SDValue Length = getValue(MI.getLength());
 
     // Emit a library call.
     TargetLowering::ArgListTy Args;
@@ -4959,18 +4960,13 @@ SelectionDAGBuilder::visitIntrinsicCall(
     Entry.Node = Src;
     Args.push_back(Entry);
 
-    Entry.Ty = I.getArgOperand(2)->getType();
-    Entry.Node = NumElements;
+    Entry.Ty = MI.getLength()->getType();
+    Entry.Node = Length;
     Args.push_back(Entry);
 
-    Entry.Ty = Type::getInt32Ty(*DAG.getContext());
-    Entry.Node = ElementSize;
-    Args.push_back(Entry);
-
-    uint64_t ElementSizeConstant =
-        cast<ConstantInt>(I.getArgOperand(3))->getZExtValue();
+    uint64_t ElementSizeConstant = MI.getElementSizeInBytes();
     RTLIB::Libcall LibraryCall =
-        RTLIB::getMEMCPY_ELEMENT_ATOMIC(ElementSizeConstant);
+        RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(ElementSizeConstant);
     if (LibraryCall == RTLIB::UNKNOWN_LIBCALL)
       report_fatal_error("Unsupported element size");
 

Modified: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp (original)
+++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp Fri Jun 16 09:43:59 2017
@@ -374,11 +374,16 @@ static void InitLibcallNames(const char
   Names[RTLIB::MEMCPY] = "memcpy";
   Names[RTLIB::MEMMOVE] = "memmove";
   Names[RTLIB::MEMSET] = "memset";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_1] = "__llvm_memcpy_element_atomic_1";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_2] = "__llvm_memcpy_element_atomic_2";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_4] = "__llvm_memcpy_element_atomic_4";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_8] = "__llvm_memcpy_element_atomic_8";
-  Names[RTLIB::MEMCPY_ELEMENT_ATOMIC_16] = "__llvm_memcpy_element_atomic_16";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_1] =
+      "__llvm_memcpy_element_unordered_atomic_1";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_2] =
+      "__llvm_memcpy_element_unordered_atomic_2";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_4] =
+      "__llvm_memcpy_element_unordered_atomic_4";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_8] =
+      "__llvm_memcpy_element_unordered_atomic_8";
+  Names[RTLIB::MEMCPY_ELEMENT_UNORDERED_ATOMIC_16] =
+      "__llvm_memcpy_element_unordered_atomic_16";
   Names[RTLIB::UNWIND_RESUME] = "_Unwind_Resume";
   Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_1] = "__sync_val_compare_and_swap_1";
   Names[RTLIB::SYNC_VAL_COMPARE_AND_SWAP_2] = "__sync_val_compare_and_swap_2";
@@ -781,22 +786,21 @@ RTLIB::Libcall RTLIB::getSYNC(unsigned O
   return UNKNOWN_LIBCALL;
 }
 
-RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_ATOMIC(uint64_t ElementSize) {
+RTLIB::Libcall RTLIB::getMEMCPY_ELEMENT_UNORDERED_ATOMIC(uint64_t ElementSize) {
   switch (ElementSize) {
   case 1:
-    return MEMCPY_ELEMENT_ATOMIC_1;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_1;
   case 2:
-    return MEMCPY_ELEMENT_ATOMIC_2;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_2;
   case 4:
-    return MEMCPY_ELEMENT_ATOMIC_4;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_4;
   case 8:
-    return MEMCPY_ELEMENT_ATOMIC_8;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_8;
   case 16:
-    return MEMCPY_ELEMENT_ATOMIC_16;
+    return MEMCPY_ELEMENT_UNORDERED_ATOMIC_16;
   default:
     return UNKNOWN_LIBCALL;
   }
-
 }
 
 /// InitCmpLibcallCCs - Set default comparison libcall CC.

Modified: llvm/trunk/lib/IR/IRBuilder.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/IR/IRBuilder.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/IR/IRBuilder.cpp (original)
+++ llvm/trunk/lib/IR/IRBuilder.cpp Fri Jun 16 09:43:59 2017
@@ -134,18 +134,17 @@ CreateMemCpy(Value *Dst, Value *Src, Val
   return CI;  
 }
 
-CallInst *IRBuilderBase::CreateElementAtomicMemCpy(
-    Value *Dst, Value *Src, Value *NumElements, uint32_t ElementSize,
-    MDNode *TBAATag, MDNode *TBAAStructTag, MDNode *ScopeTag,
-    MDNode *NoAliasTag) {
+CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
+    Value *Dst, Value *Src, Value *Size, uint32_t ElementSize, MDNode *TBAATag,
+    MDNode *TBAAStructTag, MDNode *ScopeTag, MDNode *NoAliasTag) {
   Dst = getCastedInt8PtrValue(Dst);
   Src = getCastedInt8PtrValue(Src);
 
-  Value *Ops[] = {Dst, Src, NumElements, getInt32(ElementSize)};
-  Type *Tys[] = {Dst->getType(), Src->getType()};
+  Value *Ops[] = {Dst, Src, Size, getInt32(ElementSize)};
+  Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
   Module *M = BB->getParent()->getParent();
-  Value *TheFn =
-      Intrinsic::getDeclaration(M, Intrinsic::memcpy_element_atomic, Tys);
+  Value *TheFn = Intrinsic::getDeclaration(
+      M, Intrinsic::memcpy_element_unordered_atomic, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
 

Modified: llvm/trunk/lib/IR/Verifier.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/IR/Verifier.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/IR/Verifier.cpp (original)
+++ llvm/trunk/lib/IR/Verifier.cpp Fri Jun 16 09:43:59 2017
@@ -4012,10 +4012,16 @@ void Verifier::visitIntrinsicCallSite(In
            CS);
     break;
   }
-  case Intrinsic::memcpy_element_atomic: {
-    ConstantInt *ElementSizeCI = dyn_cast<ConstantInt>(CS.getArgOperand(3));
-    Assert(ElementSizeCI, "element size of the element-wise atomic memory "
-                          "intrinsic must be a constant int",
+  case Intrinsic::memcpy_element_unordered_atomic: {
+    const ElementUnorderedAtomicMemCpyInst *MI =
+        cast<ElementUnorderedAtomicMemCpyInst>(CS.getInstruction());
+    ;
+
+    ConstantInt *ElementSizeCI =
+        dyn_cast<ConstantInt>(MI->getRawElementSizeInBytes());
+    Assert(ElementSizeCI,
+           "element size of the element-wise unordered atomic memory "
+           "intrinsic must be a constant int",
            CS);
     const APInt &ElementSizeVal = ElementSizeCI->getValue();
     Assert(ElementSizeVal.isPowerOf2(),
@@ -4023,19 +4029,24 @@ void Verifier::visitIntrinsicCallSite(In
            "must be a power of 2",
            CS);
 
+    if (auto *LengthCI = dyn_cast<ConstantInt>(MI->getLength())) {
+      uint64_t Length = LengthCI->getZExtValue();
+      uint64_t ElementSize = MI->getElementSizeInBytes();
+      Assert((Length % ElementSize) == 0,
+             "constant length must be a multiple of the element size in the "
+             "element-wise atomic memory intrinsic",
+             CS);
+    }
+
     auto IsValidAlignment = [&](uint64_t Alignment) {
       return isPowerOf2_64(Alignment) && ElementSizeVal.ule(Alignment);
     };
-
     uint64_t DstAlignment = CS.getParamAlignment(0),
              SrcAlignment = CS.getParamAlignment(1);
-
     Assert(IsValidAlignment(DstAlignment),
-           "incorrect alignment of the destination argument",
-           CS);
+           "incorrect alignment of the destination argument", CS);
     Assert(IsValidAlignment(SrcAlignment),
-           "incorrect alignment of the source argument",
-           CS);
+           "incorrect alignment of the source argument", CS);
     break;
   }
   case Intrinsic::gcroot:

Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp (original)
+++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp Fri Jun 16 09:43:59 2017
@@ -94,75 +94,80 @@ static Constant *getNegativeIsTrueBoolVe
   return ConstantVector::get(BoolVec);
 }
 
-Instruction *
-InstCombiner::SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI) {
+Instruction *InstCombiner::SimplifyElementUnorderedAtomicMemCpy(
+    ElementUnorderedAtomicMemCpyInst *AMI) {
   // Try to unfold this intrinsic into sequence of explicit atomic loads and
   // stores.
   // First check that number of elements is compile time constant.
-  auto *NumElementsCI = dyn_cast<ConstantInt>(AMI->getNumElements());
-  if (!NumElementsCI)
+  auto *LengthCI = dyn_cast<ConstantInt>(AMI->getLength());
+  if (!LengthCI)
     return nullptr;
 
   // Check that there are not too many elements.
-  uint64_t NumElements = NumElementsCI->getZExtValue();
+  uint64_t LengthInBytes = LengthCI->getZExtValue();
+  uint32_t ElementSizeInBytes = AMI->getElementSizeInBytes();
+  uint64_t NumElements = LengthInBytes / ElementSizeInBytes;
   if (NumElements >= UnfoldElementAtomicMemcpyMaxElements)
     return nullptr;
 
-  // Don't unfold into illegal integers
-  uint64_t ElementSizeInBytes = AMI->getElementSizeInBytes() * 8;
-  if (!getDataLayout().isLegalInteger(ElementSizeInBytes))
-    return nullptr;
+  // Only expand if there are elements to copy.
+  if (NumElements > 0) {
+    // Don't unfold into illegal integers
+    uint64_t ElementSizeInBits = ElementSizeInBytes * 8;
+    if (!getDataLayout().isLegalInteger(ElementSizeInBits))
+      return nullptr;
 
-  // Cast source and destination to the correct type. Intrinsic input arguments
-  // are usually represented as i8*.
-  // Often operands will be explicitly casted to i8* and we can just strip
-  // those casts instead of inserting new ones. However it's easier to rely on
-  // other InstCombine rules which will cover trivial cases anyway.
-  Value *Src = AMI->getRawSource();
-  Value *Dst = AMI->getRawDest();
-  Type *ElementPointerType = Type::getIntNPtrTy(
-      AMI->getContext(), ElementSizeInBytes, Src->getType()->getPointerAddressSpace());
-
-  Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
-                                                "memcpy_unfold.src_casted");
-  Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
-                                                "memcpy_unfold.dst_casted");
-
-  for (uint64_t i = 0; i < NumElements; ++i) {
-    // Get current element addresses
-    ConstantInt *ElementIdxCI =
-        ConstantInt::get(AMI->getContext(), APInt(64, i));
-    Value *SrcElementAddr =
-        Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
-    Value *DstElementAddr =
-        Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
-
-    // Load from the source. Transfer alignment information and mark load as
-    // unordered atomic.
-    LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
-    Load->setOrdering(AtomicOrdering::Unordered);
-    // We know alignment of the first element. It is also guaranteed by the
-    // verifier that element size is less or equal than first element alignment
-    // and both of this values are powers of two.
-    // This means that all subsequent accesses are at least element size
-    // aligned.
-    // TODO: We can infer better alignment but there is no evidence that this
-    // will matter.
-    Load->setAlignment(i == 0 ? AMI->getSrcAlignment()
-                              : AMI->getElementSizeInBytes());
-    Load->setDebugLoc(AMI->getDebugLoc());
-
-    // Store loaded value via unordered atomic store.
-    StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
-    Store->setOrdering(AtomicOrdering::Unordered);
-    Store->setAlignment(i == 0 ? AMI->getDstAlignment()
-                               : AMI->getElementSizeInBytes());
-    Store->setDebugLoc(AMI->getDebugLoc());
+    // Cast source and destination to the correct type. Intrinsic input
+    // arguments are usually represented as i8*. Often operands will be
+    // explicitly casted to i8* and we can just strip those casts instead of
+    // inserting new ones. However it's easier to rely on other InstCombine
+    // rules which will cover trivial cases anyway.
+    Value *Src = AMI->getRawSource();
+    Value *Dst = AMI->getRawDest();
+    Type *ElementPointerType =
+        Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,
+                           Src->getType()->getPointerAddressSpace());
+
+    Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
+                                                  "memcpy_unfold.src_casted");
+    Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
+                                                  "memcpy_unfold.dst_casted");
+
+    for (uint64_t i = 0; i < NumElements; ++i) {
+      // Get current element addresses
+      ConstantInt *ElementIdxCI =
+          ConstantInt::get(AMI->getContext(), APInt(64, i));
+      Value *SrcElementAddr =
+          Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
+      Value *DstElementAddr =
+          Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");
+
+      // Load from the source. Transfer alignment information and mark load as
+      // unordered atomic.
+      LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
+      Load->setOrdering(AtomicOrdering::Unordered);
+      // We know alignment of the first element. It is also guaranteed by the
+      // verifier that element size is less or equal than first element
+      // alignment and both of this values are powers of two. This means that
+      // all subsequent accesses are at least element size aligned.
+      // TODO: We can infer better alignment but there is no evidence that this
+      // will matter.
+      Load->setAlignment(i == 0 ? AMI->getParamAlignment(1)
+                                : ElementSizeInBytes);
+      Load->setDebugLoc(AMI->getDebugLoc());
+
+      // Store loaded value via unordered atomic store.
+      StoreInst *Store = Builder->CreateStore(Load, DstElementAddr);
+      Store->setOrdering(AtomicOrdering::Unordered);
+      Store->setAlignment(i == 0 ? AMI->getParamAlignment(0)
+                                 : ElementSizeInBytes);
+      Store->setDebugLoc(AMI->getDebugLoc());
+    }
   }
 
   // Set the number of elements of the copy to 0, it will be deleted on the
   // next iteration.
-  AMI->setNumElements(Constant::getNullValue(NumElementsCI->getType()));
+  AMI->setLength(Constant::getNullValue(LengthCI->getType()));
   return AMI;
 }
 
@@ -1888,12 +1893,12 @@ Instruction *InstCombiner::visitCallInst
     if (Changed) return II;
   }
 
-  if (auto *AMI = dyn_cast<ElementAtomicMemCpyInst>(II)) {
-    if (Constant *C = dyn_cast<Constant>(AMI->getNumElements()))
+  if (auto *AMI = dyn_cast<ElementUnorderedAtomicMemCpyInst>(II)) {
+    if (Constant *C = dyn_cast<Constant>(AMI->getLength()))
       if (C->isNullValue())
         return eraseInstFromFunction(*AMI);
 
-    if (Instruction *I = SimplifyElementAtomicMemCpy(AMI))
+    if (Instruction *I = SimplifyElementUnorderedAtomicMemCpy(AMI))
       return I;
   }
 

Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h (original)
+++ llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h Fri Jun 16 09:43:59 2017
@@ -726,7 +726,8 @@ private:
   Instruction *MatchBSwap(BinaryOperator &I);
   bool SimplifyStoreAtEndOfBlock(StoreInst &SI);
 
-  Instruction *SimplifyElementAtomicMemCpy(ElementAtomicMemCpyInst *AMI);
+  Instruction *
+  SimplifyElementUnorderedAtomicMemCpy(ElementUnorderedAtomicMemCpyInst *AMI);
   Instruction *SimplifyMemTransfer(MemIntrinsic *MI);
   Instruction *SimplifyMemSet(MemSetInst *MI);
 

Modified: llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (original)
+++ llvm/trunk/lib/Transforms/Scalar/LoopIdiomRecognize.cpp Fri Jun 16 09:43:59 2017
@@ -983,21 +983,21 @@ bool LoopIdiomRecognize::processLoopStor
   const SCEV *NumBytesS =
       SE->getAddExpr(BECount, SE->getOne(IntPtrTy), SCEV::FlagNUW);
 
+  if (StoreSize != 1)
+    NumBytesS = SE->getMulExpr(NumBytesS, SE->getConstant(IntPtrTy, StoreSize),
+                               SCEV::FlagNUW);
+
+  Value *NumBytes =
+      Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+
   unsigned Align = std::min(SI->getAlignment(), LI->getAlignment());
   CallInst *NewCall = nullptr;
   // Check whether to generate an unordered atomic memcpy:
   //  If the load or store are atomic, then they must neccessarily be unordered
   //  by previous checks.
-  if (!SI->isAtomic() && !LI->isAtomic()) {
-    if (StoreSize != 1)
-      NumBytesS = SE->getMulExpr(
-          NumBytesS, SE->getConstant(IntPtrTy, StoreSize), SCEV::FlagNUW);
-
-    Value *NumBytes =
-        Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
-
+  if (!SI->isAtomic() && !LI->isAtomic())
     NewCall = Builder.CreateMemCpy(StoreBasePtr, LoadBasePtr, NumBytes, Align);
-  } else {
+  else {
     // We cannot allow unaligned ops for unordered load/store, so reject
     // anything where the alignment isn't at least the element size.
     if (Align < StoreSize)
@@ -1010,11 +1010,9 @@ bool LoopIdiomRecognize::processLoopStor
     if (StoreSize > TTI->getAtomicMemIntrinsicMaxElementSize())
       return false;
 
-    Value *NumElements =
-        Expander.expandCodeFor(NumBytesS, IntPtrTy, Preheader->getTerminator());
+    NewCall = Builder.CreateElementUnorderedAtomicMemCpy(
+        StoreBasePtr, LoadBasePtr, NumBytes, StoreSize);
 
-    NewCall = Builder.CreateElementAtomicMemCpy(StoreBasePtr, LoadBasePtr,
-                                                NumElements, StoreSize);
     // Propagate alignment info onto the pointer args. Note that unordered
     // atomic loads/stores are *required* by the spec to have an alignment
     // but non-atomic loads/stores may not.

Modified: llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll (original)
+++ llvm/trunk/test/CodeGen/X86/element-wise-atomic-memory-intrinsics.ll Fri Jun 16 09:43:59 2017
@@ -2,47 +2,47 @@
 
 define i8* @test_memcpy1(i8* %P, i8* %Q) {
   ; CHECK: test_memcpy
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 1, i32 1)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 1)
   ret i8* %P
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $1, %edx
-  ; CHECK-DAG: movl $1, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_1
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_1
 }
 
 define i8* @test_memcpy2(i8* %P, i8* %Q) {
   ; CHECK: test_memcpy2
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 2, i32 2)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 2, i32 2)
   ret i8* %P
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $2, %edx
-  ; CHECK-DAG: movl $2, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_2
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_2
 }
 
 define i8* @test_memcpy4(i8* %P, i8* %Q) {
   ; CHECK: test_memcpy4
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 4 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 4, i32 4)
   ret i8* %P
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $4, %edx
-  ; CHECK-DAG: movl $4, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_4
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_4
 }
 
 define i8* @test_memcpy8(i8* %P, i8* %Q) {
   ; CHECK: test_memcpy8
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %P, i8* align 8 %Q, i64 8, i32 8)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %P, i8* align 8 %Q, i32 8, i32 8)
   ret i8* %P
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $8, %edx
-  ; CHECK-DAG: movl $8, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_8
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_8
 }
 
 define i8* @test_memcpy16(i8* %P, i8* %Q) {
   ; CHECK: test_memcpy16
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %P, i8* align 16 %Q, i64 16, i32 16)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %P, i8* align 16 %Q, i32 16, i32 16)
   ret i8* %P
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $16, %edx
-  ; CHECK-DAG: movl $16, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_16
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_16
 }
 
 define void @test_memcpy_args(i8** %Storage) {
@@ -51,18 +51,15 @@ define void @test_memcpy_args(i8** %Stor
   %Src.addr = getelementptr i8*, i8** %Storage, i64 1
   %Src = load i8*, i8** %Src.addr
 
-  ; First argument
+  ; 1st arg (%rdi)
   ; CHECK-DAG: movq (%rdi), [[REG1:%r.+]]
   ; CHECK-DAG: movq [[REG1]], %rdi
-  ; Second argument
+  ; 2nd arg (%rsi)
   ; CHECK-DAG: movq 8(%rdi), %rsi
-  ; Third argument
+  ; 3rd arg (%edx) -- length
   ; CHECK-DAG: movl $4, %edx
-  ; Fourth argument
-  ; CHECK-DAG: movl $4, %ecx
-  ; CHECK: __llvm_memcpy_element_atomic_4
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 4, i32 4)
-  ret void
+  ; CHECK: __llvm_memcpy_element_unordered_atomic_4
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %Dst, i8* align 4 %Src, i32 4, i32 4)  ret void
 }
 
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

Modified: llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll (original)
+++ llvm/trunk/test/Transforms/InstCombine/element-atomic-memcpy-to-loads.ll Fri Jun 16 09:43:59 2017
@@ -1,10 +1,11 @@
 ; RUN: opt -instcombine -unfold-element-atomic-memcpy-max-elements=8 -S < %s | FileCheck %s
+; Temporarily an expected failure until inst combine is updated in the next patch
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 
-; Test basic unfolding
-define void @test1(i8* %Src, i8* %Dst) {
-; CHECK-LABEL: test1
-; CHECK-NOT: llvm.memcpy.element.atomic
+; Test basic unfolding -- unordered load & store
+define void @test1a(i8* %Src, i8* %Dst) {
+; CHECK-LABEL: test1a
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
 
 ; CHECK-DAG: %memcpy_unfold.src_casted = bitcast i8* %Src to i32*
 ; CHECK-DAG: %memcpy_unfold.dst_casted = bitcast i8* %Dst to i32*
@@ -21,7 +22,7 @@ define void @test1(i8* %Src, i8* %Dst) {
 ; CHECK-DAG: [[VAL4:%[^\s]+]] =  load atomic i32, i32* %{{[^\s]+}} unordered, align 4
 ; CHECK-DAG: store atomic i32 [[VAL4]], i32* %{{[^\s]+}} unordered, align 4
 entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 8 %Src, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 16, i32 4)
   ret void
 }
 
@@ -31,9 +32,9 @@ define void @test2(i8* %Src, i8* %Dst) {
 
 ; CHECK-NOT: load
 ; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
 entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dst, i8* align 4 %Src, i64 1000, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %Dst, i8* align 4 %Src, i32 256, i32 4)
   ret void
 }
 
@@ -43,16 +44,16 @@ define void @test3(i8* %Src, i8* %Dst) {
 
 ; CHECK-NOT: load
 ; CHECK-NOT: store
-; CHECK: llvm.memcpy.element.atomic
+; CHECK: llvm.memcpy.element.unordered.atomic
 entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 4, i32 64)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 64, i32 64)
   ret void
 }
 
 ; Test that we will eliminate redundant bitcasts
 define void @test4(i64* %Src, i64* %Dst) {
 ; CHECK-LABEL: test4
-; CHECK-NOT: llvm.memcpy.element.atomic
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic
 
 ; CHECK-NOT: bitcast
 
@@ -76,17 +77,18 @@ define void @test4(i64* %Src, i64* %Dst)
 entry:
   %Src.casted = bitcast i64* %Src to i8*
   %Dst.casted = bitcast i64* %Dst to i8*
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i64 4, i32 8)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %Dst.casted, i8* align 16 %Src.casted, i32 32, i32 8)
   ret void
 }
 
+; Test that 0-length unordered atomic memcpy gets removed.
 define void @test5(i8* %Src, i8* %Dst) {
 ; CHECK-LABEL: test5
 
-; CHECK-NOT: llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+; CHECK-NOT: llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
 entry:
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 64 %Dst, i8* align 64 %Src, i64 0, i32 64)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 64 %Dst, i8* align 64 %Src, i32 0, i32 8)
   ret void
 }
 
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32)
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind

Modified: llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll (original)
+++ llvm/trunk/test/Transforms/LoopIdiom/X86/unordered-atomic-memcpy.ll Fri Jun 16 09:43:59 2017
@@ -5,7 +5,7 @@ target triple = "x86_64-unknown-linux-gn
 ;; memcpy.atomic formation (atomic load & store)
 define void @test1(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test1(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -30,7 +30,7 @@ for.end:
 ;; memcpy.atomic formation (atomic store, normal load)
 define void @test2(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test2(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -55,7 +55,7 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic store, normal load w/ no align)
 define void @test2b(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test2b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -80,7 +80,7 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic store, normal load w/ bad align)
 define void @test2c(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test2c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -105,7 +105,7 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic store w/ bad align, normal load)
 define void @test2d(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test2d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -131,7 +131,7 @@ for.end:
 ;; memcpy.atomic formation (normal store, atomic load)
 define void @test3(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test3(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %Dest, i8* align 1 %Base, i64 %Size, i32 1)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -156,7 +156,7 @@ for.end:
 ;; memcpy.atomic formation rejection (normal store w/ no align, atomic load)
 define void @test3b(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test3b(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -181,7 +181,7 @@ for.end:
 ;; memcpy.atomic formation rejection (normal store, atomic load w/ bad align)
 define void @test3c(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test3c(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -206,7 +206,7 @@ for.end:
 ;; memcpy.atomic formation rejection (normal store w/ bad align, atomic load)
 define void @test3d(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test3d(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -232,7 +232,7 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic load, ordered-atomic store)
 define void @test4(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test4(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -257,7 +257,7 @@ for.end:
 ;; memcpy.atomic formation rejection (ordered-atomic load, unordered-atomic store)
 define void @test5(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test5(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:
@@ -282,7 +282,8 @@ for.end:
 ;; memcpy.atomic formation (atomic load & store) -- element size 2
 define void @test6(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test6(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 %Size, i32 2)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 1
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 2 %Dest{{[0-9]*}}, i8* align 2 %Base{{[0-9]*}}, i64 [[Sz]], i32 2)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -307,7 +308,8 @@ for.end:
 ;; memcpy.atomic formation (atomic load & store) -- element size 4
 define void @test7(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test7(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 %Size, i32 4)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 2
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 4 %Dest{{[0-9]*}}, i8* align 4 %Base{{[0-9]*}}, i64 [[Sz]], i32 4)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -332,7 +334,8 @@ for.end:
 ;; memcpy.atomic formation (atomic load & store) -- element size 8
 define void @test8(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test8(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 %Size, i32 8)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 3
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 8 %Dest{{[0-9]*}}, i8* align 8 %Base{{[0-9]*}}, i64 [[Sz]], i32 8)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -357,7 +360,8 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic load & store) -- element size 16
 define void @test9(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test9(
-; CHECK: call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 %Size, i32 16)
+; CHECK: [[Sz:%[0-9]+]] = shl i64 %Size, 4
+; CHECK: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 16 %Dest{{[0-9]*}}, i8* align 16 %Base{{[0-9]*}}, i64 [[Sz]], i32 16)
 ; CHECK-NOT: store
 ; CHECK: ret void
 bb.nph:
@@ -382,7 +386,7 @@ for.end:
 ;; memcpy.atomic formation rejection (atomic load & store) -- element size 32
 define void @test10(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test10(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:

Modified: llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll (original)
+++ llvm/trunk/test/Transforms/LoopIdiom/unordered-atomic-memcpy-noarch.ll Fri Jun 16 09:43:59 2017
@@ -5,7 +5,7 @@ target datalayout = "e-p:64:64:64-i1:8:8
 ;;  Will not create call due to a max element size of 0
 define void @test1(i64 %Size) nounwind ssp {
 ; CHECK-LABEL: @test1(
-; CHECK-NOT: call void @llvm.memcpy.element.atomic
+; CHECK-NOT: call void @llvm.memcpy.element.unordered.atomic
 ; CHECK: store
 ; CHECK: ret void
 bb.nph:

Modified: llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll?rev=305558&r1=305557&r2=305558&view=diff
==============================================================================
--- llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll (original)
+++ llvm/trunk/test/Verifier/element-wise-atomic-memory-intrinsics.ll Fri Jun 16 09:43:59 2017
@@ -1,17 +1,25 @@
 ; RUN: not opt -verify < %s 2>&1 | FileCheck %s
 
-define void @test_memcpy(i8* %P, i8* %Q) {
+define void @test_memcpy(i8* %P, i8* %Q, i32 %A, i32 %E) {
+  ; CHECK: element size of the element-wise unordered atomic memory intrinsic must be a constant int
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 %E)
   ; CHECK: element size of the element-wise atomic memory intrinsic must be a power of 2
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 2 %Q, i64 4, i32 3)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 1, i32 3)
 
+  ; CHECK: constant length must be a multiple of the element size in the element-wise atomic memory intrinsic
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 4 %Q, i32 7, i32 4)
+
+  ; CHECK: incorrect alignment of the destination argument
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* %P, i8* align 4 %Q, i32 1, i32 1)
   ; CHECK: incorrect alignment of the destination argument
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 2 %P, i8* align 4 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %P, i8* align 4 %Q, i32 4, i32 4)
 
   ; CHECK: incorrect alignment of the source argument
-  call void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* align 4 %P, i8* align 2 %Q, i64 4, i32 4)
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* %Q, i32 1, i32 1)
+  ; CHECK: incorrect alignment of the source argument
+  call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %P, i8* align 1 %Q, i32 4, i32 4)
 
   ret void
 }
-declare void @llvm.memcpy.element.atomic.p0i8.p0i8(i8* nocapture, i8* nocapture, i64, i32) nounwind
-
+declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32) nounwind
 ; CHECK: input module is broken!