Evan,<br><br>I am seeing an assert when I compile a program with llc (the test program, strcat.llvm.mips64el.ll is attached to this email):<br><br>$ llc -march=mips64el -mcpu=mips64r2 -mattr=n64   -disable-mips-delay-filler -filetype=asm -relocation-model=pic    -asm-verbose=false -O3  Output/strcat.llvm.mips64el.ll -o Output/strcat.llc.mips64r2.s<br>


<br>llc: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3588: llvm::SDValue getMemcpyLoadsAndStores(llvm::SelectionDAG&, llvm::DebugLoc, llvm::SDValue, llvm::SDValue, llvm::SDValue, uint64_t, unsigned int, bool, bool, llvm::MachinePointerInfo, llvm::MachinePointerInfo): Assertion `i == NumMemOps-1 && i != 0' failed.<br>


<br>The memcpy instruction which is causing assert copies an array of 7 chars to an i8 address.<br>


<br>

(gdb) p I.dump()<br>


  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %endptr, i8* 

getelementptr inbounds ([7 x i8]* @.str2, i64 0, i64 0), i64 7, i32 1, 

i1 false)<br>


<br>I am not familiar with the pieces you touched in this commit, but llc terminates normally if I force the code to execute the else clause here by setting Flag=0 inside  MipsTargetLowering::allowsUnalignedMemoryAccesses:<br>


<br>SelectionDAG.cpp:3510<br><br>+      // If the new VT cannot cover all of the remaining bits, then consider<br>

+      // issuing a (or a pair of) unaligned and overlapping load / store.<br>

+      // FIXME: Only does this for 64-bit or more since we don't have proper<br>

+      // cost model for unaligned load / store.<br>

+      bool Fast;<br>

+      if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&<br>

+          TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)<br>+        VTSize = Size;<br>+      else {<br>+        VT = NewVT;<br>+        VTSize = NewVTSize;       <br><br>I think overlapping shouldn't be allowed here when the the source size (7B) is smaller than the load size (8B). <br>

<br>Do you have any idea how this can be fixed?<br>

<br><div class="gmail_quote">On Mon, Dec 10, 2012 at 3:21 PM, Evan Cheng <span dir="ltr"><<a href="mailto:evan.cheng@apple.com" target="_blank">evan.cheng@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Author: evancheng<br>

Date: Mon Dec 10 17:21:26 2012<br>

New Revision: 169791<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=169791&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=169791&view=rev</a><br>

Log:<br>

Some enhancements for memcpy / memset inline expansion.<br>

1. Teach it to use overlapping unaligned load / store to copy / set the trailing<br>

   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.<br>

2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.<br>

   x86 and ARM.<br>

3. When memcpy from a constant string, do *not* replace the load with a constant<br>

   if it's not possible to materialize an integer immediate with a single<br>

   instruction (required a new target hook: TLI.isIntImmLegal()).<br>

4. Use unaligned load / stores more aggressively if target hooks indicates they<br>

   are "fast".<br>

5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.<br>

   Also increase the threshold to something reasonable (8 for memset, 4 pairs<br>

   for memcpy).<br>

<br>

This significantly improves Dhrystone, up to 50% on ARM iOS devices.<br>

<br>

rdar://12760078<br>

<br>

Added:<br>

    llvm/trunk/test/CodeGen/ARM/memset-inline.ll<br>

Removed:<br>

    llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll<br>

Modified:<br>

    llvm/trunk/include/llvm/Target/TargetLowering.h<br>

    llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp<br>

    llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp<br>

    llvm/trunk/lib/Target/ARM/ARMISelLowering.h<br>

    llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td<br>

    llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp<br>

    llvm/trunk/lib/Target/Mips/MipsISelLowering.h<br>

    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

    llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>

    llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll<br>

    llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll<br>

    llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll<br>

    llvm/trunk/test/CodeGen/X86/memcpy-2.ll<br>

<br>

Modified: llvm/trunk/include/llvm/Target/TargetLowering.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/include/llvm/Target/TargetLowering.h (original)<br>

+++ llvm/trunk/include/llvm/Target/TargetLowering.h Mon Dec 10 17:21:26 2012<br>

@@ -371,6 +371,16 @@<br>

     return false;<br>

   }<br>

<br>

+  /// isIntImmLegal - Returns true if the target can instruction select the<br>

+  /// specified integer immediate natively (that is, it's materialized with one<br>

+  /// instruction). The current *assumption* in isel is all of integer<br>

+  /// immediates are "legal" and only the memcpy / memset expansion code is<br>

+  /// making use of this. The rest of isel doesn't have proper cost model for<br>

+  /// immediate materialization.<br>

+  virtual bool isIntImmLegal(const APInt &/*Imm*/, EVT /*VT*/) const {<br>

+    return true;<br>

+  }<br>

+<br>

   /// isShuffleMaskLegal - Targets can use this to indicate that they only<br>

   /// support *some* VECTOR_SHUFFLE operations, those with specific masks.<br>

   /// By default, if a target supports the VECTOR_SHUFFLE node, all mask values<br>

@@ -678,12 +688,14 @@<br>

   }<br>

<br>

   /// This function returns true if the target allows unaligned memory accesses.<br>

-  /// of the specified type. This is used, for example, in situations where an<br>

-  /// array copy/move/set is  converted to a sequence of store operations. It's<br>

-  /// use helps to ensure that such replacements don't generate code that causes<br>

-  /// an alignment error  (trap) on the target machine.<br>

+  /// of the specified type. If true, it also returns whether the unaligned<br>

+  /// memory access is "fast" in the second argument by reference. This is used,<br>

+  /// for example, in situations where an array copy/move/set is  converted to a<br>

+  /// sequence of store operations. It's use helps to ensure that such<br>

+  /// replacements don't generate code that causes an alignment error  (trap) on<br>

+  /// the target machine.<br>

   /// @brief Determine if the target supports unaligned memory accesses.<br>

-  virtual bool allowsUnalignedMemoryAccesses(EVT) const {<br>

+  virtual bool allowsUnalignedMemoryAccesses(EVT, bool *Fast = 0) const {<br>

     return false;<br>

   }<br>

<br>

<br>

Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (original)<br>

+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Mon Dec 10 17:21:26 2012<br>

@@ -3373,7 +3373,7 @@<br>

   unsigned NumVTBytes = VT.getSizeInBits() / 8;<br>

   unsigned NumBytes = std::min(NumVTBytes, unsigned(Str.size()));<br>

<br>

-  uint64_t Val = 0;<br>

+  APInt Val(NumBytes*8, 0);<br>

   if (TLI.isLittleEndian()) {<br>

     for (unsigned i = 0; i != NumBytes; ++i)<br>

       Val |= (uint64_t)(unsigned char)Str[i] << i*8;<br>

@@ -3382,7 +3382,9 @@<br>

       Val |= (uint64_t)(unsigned char)Str[i] << (NumVTBytes-i-1)*8;<br>

   }<br>

<br>

-  return DAG.getConstant(Val, VT);<br>

+  if (TLI.isIntImmLegal(Val, VT))<br>

+    return DAG.getConstant(Val, VT);<br>

+  return SDValue(0, 0);<br>

 }<br>

<br>

 /// getMemBasePlusOffset - Returns base and offset node for the<br>

@@ -3422,6 +3424,7 @@<br>

                                      unsigned DstAlign, unsigned SrcAlign,<br>

                                      bool IsZeroVal,<br>

                                      bool MemcpyStrSrc,<br>

+                                     bool AllowOverlap,<br>

                                      SelectionDAG &DAG,<br>

                                      const TargetLowering &TLI) {<br>

   assert((SrcAlign == 0 || SrcAlign >= DstAlign) &&<br>

@@ -3461,24 +3464,47 @@<br>

<br>

   unsigned NumMemOps = 0;<br>

   while (Size != 0) {<br>

+    if (++NumMemOps > Limit)<br>

+      return false;<br>

+<br>

     unsigned VTSize = VT.getSizeInBits() / 8;<br>

     while (VTSize > Size) {<br>

       // For now, only use non-vector load / store's for the left-over pieces.<br>

+      EVT NewVT;<br>

+      unsigned NewVTSize;<br>

       if (VT.isVector() || VT.isFloatingPoint()) {<br>

-        VT = MVT::i64;<br>

-        while (!TLI.isTypeLegal(VT))<br>

-          VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>

-        VTSize = VT.getSizeInBits() / 8;<br>

+        NewVT = (VT.getSizeInBits() > 64) ? MVT::i64 : MVT::i32;<br>

+        while (!TLI.isOperationLegalOrCustom(ISD::STORE, NewVT)) {<br>

+          if (NewVT == MVT::i64 &&<br>

+              TLI.isOperationLegalOrCustom(ISD::STORE, MVT::f64)) {<br>

+            // i64 is usually not legal on 32-bit targets, but f64 may be.<br>

+            NewVT = MVT::f64;<br>

+            break;<br>

+          }<br>

+          NewVT = (MVT::SimpleValueType)(NewVT.getSimpleVT().SimpleTy - 1);<br>

+        }<br>

+        NewVTSize = NewVT.getSizeInBits() / 8;<br>

       } else {<br>

         // This can result in a type that is not legal on the target, e.g.<br>

         // 1 or 2 bytes on PPC.<br>

-        VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>

-        VTSize >>= 1;<br>

+        NewVT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>

+        NewVTSize = VTSize >> 1;<br>

+      }<br>

+<br>

+      // If the new VT cannot cover all of the remaining bits, then consider<br>

+      // issuing a (or a pair of) unaligned and overlapping load / store.<br>

+      // FIXME: Only does this for 64-bit or more since we don't have proper<br>

+      // cost model for unaligned load / store.<br>

+      bool Fast;<br>

+      if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&<br>

+          TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)<br>

+        VTSize = Size;<br>

+      else {<br>

+        VT = NewVT;<br>

+        VTSize = NewVTSize;<br>

       }<br>

     }<br>

<br>

-    if (++NumMemOps > Limit)<br>

-      return false;<br>

     MemOps.push_back(VT);<br>

     Size -= VTSize;<br>

   }<br>

@@ -3523,7 +3549,7 @@<br>

   if (!FindOptimalMemOpLowering(MemOps, Limit, Size,<br>

                                 (DstAlignCanChange ? 0 : Align),<br>

                                 (isZeroStr ? 0 : SrcAlign),<br>

-                                true, CopyFromStr, DAG, TLI))<br>

+                                true, CopyFromStr, true, DAG, TLI))<br>

     return SDValue();<br>

<br>

   if (DstAlignCanChange) {<br>

@@ -3545,6 +3571,14 @@<br>

     unsigned VTSize = VT.getSizeInBits() / 8;<br>

     SDValue Value, Store;<br>

<br>

+    if (VTSize > Size) {<br>

+      // Issuing an unaligned load / store pair  that overlaps with the previous<br>

+      // pair. Adjust the offset accordingly.<br>

+      assert(i == NumMemOps-1 && i != 0);<br>

+      SrcOff -= VTSize - Size;<br>

+      DstOff -= VTSize - Size;<br>

+    }<br>

+<br>

     if (CopyFromStr &&<br>

         (isZeroStr || (VT.isInteger() && !VT.isVector()))) {<br>

       // It's unlikely a store of a vector immediate can be done in a single<br>

@@ -3553,11 +3587,14 @@<br>

       // FIXME: Handle other cases where store of vector immediate is done in<br>

       // a single instruction.<br>

       Value = getMemsetStringVal(VT, dl, DAG, TLI, Str.substr(SrcOff));<br>

-      Store = DAG.getStore(Chain, dl, Value,<br>

-                           getMemBasePlusOffset(Dst, DstOff, DAG),<br>

-                           DstPtrInfo.getWithOffset(DstOff), isVol,<br>

-                           false, Align);<br>

-    } else {<br>

+      if (Value.getNode())<br>

+        Store = DAG.getStore(Chain, dl, Value,<br>

+                             getMemBasePlusOffset(Dst, DstOff, DAG),<br>

+                             DstPtrInfo.getWithOffset(DstOff), isVol,<br>

+                             false, Align);<br>

+    }<br>

+<br>

+    if (!Store.getNode()) {<br>

       // The type might not be legal for the target.  This should only happen<br>

       // if the type is smaller than a legal type, as on PPC, so the right<br>

       // thing to do is generate a LoadExt/StoreTrunc pair.  These simplify<br>

@@ -3577,6 +3614,7 @@<br>

     OutChains.push_back(Store);<br>

     SrcOff += VTSize;<br>

     DstOff += VTSize;<br>

+    Size -= VTSize;<br>

   }<br>

<br>

   return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,<br>

@@ -3613,7 +3651,7 @@<br>

<br>

   if (!FindOptimalMemOpLowering(MemOps, Limit, Size,<br>

                                 (DstAlignCanChange ? 0 : Align),<br>

-                                SrcAlign, true, false, DAG, TLI))<br>

+                                SrcAlign, true, false, false, DAG, TLI))<br>

     return SDValue();<br>

<br>

   if (DstAlignCanChange) {<br>

@@ -3689,7 +3727,7 @@<br>

     isa<ConstantSDNode>(Src) && cast<ConstantSDNode>(Src)->isNullValue();<br>

   if (!FindOptimalMemOpLowering(MemOps, TLI.getMaxStoresPerMemset(OptSize),<br>

                                 Size, (DstAlignCanChange ? 0 : Align), 0,<br>

-                                IsZeroVal, false, DAG, TLI))<br>

+                                IsZeroVal, false, true, DAG, TLI))<br>

     return SDValue();<br>

<br>

   if (DstAlignCanChange) {<br>

@@ -3716,6 +3754,13 @@<br>

<br>

   for (unsigned i = 0; i < NumMemOps; i++) {<br>

     EVT VT = MemOps[i];<br>

+    unsigned VTSize = VT.getSizeInBits() / 8;<br>

+    if (VTSize > Size) {<br>

+      // Issuing an unaligned load / store pair  that overlaps with the previous<br>

+      // pair. Adjust the offset accordingly.<br>

+      assert(i == NumMemOps-1 && i != 0);<br>

+      DstOff -= VTSize - Size;<br>

+    }<br>

<br>

     // If this store is smaller than the largest store see whether we can get<br>

     // the smaller value for free with a truncate.<br>

@@ -3734,6 +3779,7 @@<br>

                                  isVol, false, Align);<br>

     OutChains.push_back(Store);<br>

     DstOff += VT.getSizeInBits() / 8;<br>

+    Size -= VTSize;<br>

   }<br>

<br>

   return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,<br>

<br>

Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)<br>

+++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Mon Dec 10 17:21:26 2012<br>

@@ -833,9 +833,12 @@<br>

     setSchedulingPreference(Sched::Hybrid);<br>

<br>

   //// temporary - rewrite interface to use type<br>

-  maxStoresPerMemcpy = maxStoresPerMemcpyOptSize = 1;<br>

-  maxStoresPerMemset = 16;<br>

+  maxStoresPerMemset = 8;<br>

   maxStoresPerMemsetOptSize = Subtarget->isTargetDarwin() ? 8 : 4;<br>

+  maxStoresPerMemcpy = 4; // For @llvm.memcpy -> sequence of stores<br>

+  maxStoresPerMemcpyOptSize = Subtarget->isTargetDarwin() ? 4 : 2;<br>

+  maxStoresPerMemmove = 4; // For @llvm.memmove -> sequence of stores<br>

+  maxStoresPerMemmoveOptSize = Subtarget->isTargetDarwin() ? 4 : 2;<br>

<br>

   // On ARM arguments smaller than 4 bytes are extended, so all arguments<br>

   // are at least 4 bytes aligned.<br>

@@ -9406,7 +9409,7 @@<br>

   return (VT == MVT::f32) && (Opc == ISD::LOAD || Opc == ISD::STORE);<br>

 }<br>

<br>

-bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {<br>

+bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>

   // The AllowsUnaliged flag models the SCTLR.A setting in ARM cpus<br>

   bool AllowsUnaligned = Subtarget->allowsUnalignedMem();<br>

<br>

@@ -9415,15 +9418,27 @@<br>

     return false;<br>

   case MVT::i8:<br>

   case MVT::i16:<br>

-  case MVT::i32:<br>

+  case MVT::i32: {<br>

     // Unaligned access can use (for example) LRDB, LRDH, LDR<br>

-    return AllowsUnaligned;<br>

+    if (AllowsUnaligned) {<br>

+      if (Fast)<br>

+        *Fast = Subtarget->hasV7Ops();<br>

+      return true;<br>

+    }<br>

+    return false;<br>

+  }<br>

   case MVT::f64:<br>

-  case MVT::v2f64:<br>

+  case MVT::v2f64: {<br>

     // For any little-endian targets with neon, we can support unaligned ld/st<br>

     // of D and Q (e.g. {D0,D1}) registers by using vld1.i8/vst1.i8.<br>

     // A big-endian target may also explictly support unaligned accesses<br>

-    return Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian());<br>

+    if (Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian())) {<br>

+      if (Fast)<br>

+        *Fast = true;<br>

+      return true;<br>

+    }<br>

+    return false;<br>

+  }<br>

   }<br>

 }<br>

<br>

@@ -9442,12 +9457,17 @@<br>

<br>

   // See if we can use NEON instructions for this...<br>

   if (IsZeroVal &&<br>

-      !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat) &&<br>

-      Subtarget->hasNEON()) {<br>

-    if (memOpAlign(SrcAlign, DstAlign, 16) && Size >= 16) {<br>

-      return MVT::v4i32;<br>

-    } else if (memOpAlign(SrcAlign, DstAlign, 8) && Size >= 8) {<br>

-      return MVT::v2i32;<br>

+      Subtarget->hasNEON() &&<br>

+      !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat)) {<br>

+    bool Fast;<br>

+    if (Size >= 16 && (memOpAlign(SrcAlign, DstAlign, 16) ||<br>

+                       (allowsUnalignedMemoryAccesses(MVT::v2f64, &Fast) &&<br>

+                        Fast))) {<br>

+      return MVT::v2f64;<br>

+    } else if (Size >= 8 && (memOpAlign(SrcAlign, DstAlign, 8) ||<br>

+                             (allowsUnalignedMemoryAccesses(MVT::f64, &Fast) &&<br>

+                              Fast))) {<br>

+      return MVT::f64;<br>

     }<br>

   }<br>

<br>

@@ -10241,6 +10261,24 @@<br>

   return false;<br>

 }<br>

<br>

+bool ARMTargetLowering::isIntImmLegal(const APInt &Imm, EVT VT) const {<br>

+  if (VT.getSizeInBits() > 32)<br>

+    return false;<br>

+<br>

+  int32_t ImmVal = Imm.getSExtValue();<br>

+  if (!Subtarget->isThumb()) {<br>

+    return (ImmVal >= 0 && ImmVal < 65536) ||<br>

+      (ARM_AM::getSOImmVal(ImmVal) != -1) ||<br>

+      (ARM_AM::getSOImmVal(~ImmVal) != -1);<br>

+  } else if (Subtarget->isThumb2()) {<br>

+    return (ImmVal >= 0 && ImmVal < 65536) ||<br>

+      (ARM_AM::getT2SOImmVal(ImmVal) != -1) ||<br>

+      (ARM_AM::getT2SOImmVal(~ImmVal) != -1);<br>

+  } else /*Thumb1*/ {<br>

+    return (ImmVal >= 0 && ImmVal < 256);<br>

+  }<br>

+}<br>

+<br>

 /// getTgtMemIntrinsic - Represent NEON load and store intrinsics as<br>

 /// MemIntrinsicNodes.  The associated MachineMemOperands record the alignment<br>

 /// specified in the intrinsic calls.<br>

<br>

Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/ARM/ARMISelLowering.h (original)<br>

+++ llvm/trunk/lib/Target/ARM/ARMISelLowering.h Mon Dec 10 17:21:26 2012<br>

@@ -285,8 +285,9 @@<br>

     bool isDesirableToTransformToIntegerOp(unsigned Opc, EVT VT) const;<br>

<br>

     /// allowsUnalignedMemoryAccesses - Returns true if the target allows<br>

-    /// unaligned memory accesses. of the specified type.<br>

-    virtual bool allowsUnalignedMemoryAccesses(EVT VT) const;<br>

+    /// unaligned memory accesses of the specified type. Returns whether it<br>

+    /// is "fast" by reference in the second argument.<br>

+    virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;<br>

<br>

     virtual EVT getOptimalMemOpType(uint64_t Size,<br>

                                     unsigned DstAlign, unsigned SrcAlign,<br>

@@ -386,6 +387,8 @@<br>

     /// materialize the FP immediate as a load from a constant pool.<br>

     virtual bool isFPImmLegal(const APFloat &Imm, EVT VT) const;<br>

<br>

+    virtual bool isIntImmLegal(const APInt &Imm, EVT VT) const;<br>

+<br>

     virtual bool getTgtMemIntrinsic(IntrinsicInfo &Info,<br>

                                     const CallInst &I,<br>

                                     unsigned Intrinsic) const;<br>

<br>

Modified: llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td (original)<br>

+++ llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td Mon Dec 10 17:21:26 2012<br>

@@ -2315,13 +2315,15 @@<br>

 /// changed to modify CPSR.<br>

 multiclass T2I_un_irs<bits<4> opcod, string opc,<br>

                      InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,<br>

-                      PatFrag opnode, bit Cheap = 0, bit ReMat = 0> {<br>

+                      PatFrag opnode,<br>

+                      bit Cheap = 0, bit ReMat = 0, bit MoveImm = 0> {<br>

    // shifted imm<br>

    def i : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), iii,<br>

                 opc, "\t$Rd, $imm",<br>

                 [(set rGPR:$Rd, (opnode t2_so_imm:$imm))]> {<br>

      let isAsCheapAsAMove = Cheap;<br>

      let isReMaterializable = ReMat;<br>

+     let isMoveImm = MoveImm;<br>

      let Inst{31-27} = 0b11110;<br>

      let Inst{25} = 0;<br>

      let Inst{24-21} = opcod;<br>

@@ -2355,7 +2357,7 @@<br>

 let AddedComplexity = 1 in<br>

 defm t2MVN  : T2I_un_irs <0b0011, "mvn",<br>

                           IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,<br>

-                          UnOpFrag<(not node:$Src)>, 1, 1>;<br>

+                          UnOpFrag<(not node:$Src)>, 1, 1, 1>;<br>

<br>

 let AddedComplexity = 1 in<br>

 def : T2Pat<(and     rGPR:$src, t2_so_imm_not:$imm),<br>

<br>

Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp (original)<br>

+++ llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp Mon Dec 10 17:21:26 2012<br>

@@ -457,7 +457,8 @@<br>

   maxStoresPerMemcpy = 16;<br>

 }<br>

<br>

-bool MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {<br>

+bool<br>

+MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>

   MVT::SimpleValueType SVT = VT.getSimpleVT().SimpleTy;<br>

<br>

   if (Subtarget->inMips16Mode())<br>

@@ -466,6 +467,8 @@<br>

   switch (SVT) {<br>

   case MVT::i64:<br>

   case MVT::i32:<br>

+    if (Fast)<br>

+      *Fast = true;<br>

     return true;<br>

   default:<br>

     return false;<br>

<br>

Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/Mips/MipsISelLowering.h (original)<br>

+++ llvm/trunk/lib/Target/Mips/MipsISelLowering.h Mon Dec 10 17:21:26 2012<br>

@@ -149,7 +149,7 @@<br>

<br>

     virtual MVT getShiftAmountTy(EVT LHSTy) const { return MVT::i32; }<br>

<br>

-    virtual bool allowsUnalignedMemoryAccesses (EVT VT) const;<br>

+    virtual bool allowsUnalignedMemoryAccesses (EVT VT, bool *Fast) const;<br>

<br>

     virtual void LowerOperationWrapper(SDNode *N,<br>

                                        SmallVectorImpl<SDValue> &Results,<br>

<br>

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>

+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Mon Dec 10 17:21:26 2012<br>

@@ -1412,6 +1412,13 @@<br>

   return MVT::i32;<br>

 }<br>

<br>

+bool<br>

+X86TargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>

+  if (Fast)<br>

+    *Fast = Subtarget->isUnalignedMemAccessFast();<br>

+  return true;<br>

+}<br>

+<br>

 /// getJumpTableEncoding - Return the entry encoding for a jump table in the<br>

 /// current function.  The returned value is a member of the<br>

 /// MachineJumpTableInfo::JTEntryKind enum.<br>

<br>

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)<br>

+++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Mon Dec 10 17:21:26 2012<br>

@@ -507,10 +507,9 @@<br>

                         MachineFunction &MF) const;<br>

<br>

     /// allowsUnalignedMemoryAccesses - Returns true if the target allows<br>

-    /// unaligned memory accesses. of the specified type.<br>

-    virtual bool allowsUnalignedMemoryAccesses(EVT VT) const {<br>

-      return true;<br>

-    }<br>

+    /// unaligned memory accesses. of the specified type. Returns whether it<br>

+    /// is "fast" by reference in the second argument.<br>

+    virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;<br>

<br>

     /// LowerOperation - Provide custom lowering hooks for some operations.<br>

     ///<br>

<br>

Modified: llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll (original)<br>

+++ llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll Mon Dec 10 17:21:26 2012<br>

@@ -1,13 +1,5 @@<br>

 ; RUN: llc -march=arm -mcpu=cortex-a8 < %s | FileCheck %s<br>

<br>

-; Should trigger a NEON store.<br>

-; CHECK: vstr<br>

-define void @f_0_12(i8* nocapture %c) nounwind optsize {<br>

-entry:<br>

-  call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)<br>

-  ret void<br>

-}<br>

-<br>

 ; Trigger multiple NEON stores.<br>

 ; CHECK:      vst1.64<br>

 ; CHECK-NEXT: vst1.64<br>

<br>

Modified: llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll (original)<br>

+++ llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll Mon Dec 10 17:21:26 2012<br>

@@ -1,18 +1,115 @@<br>

-; RUN: llc < %s -mtriple=thumbv7-apple-darwin -disable-post-ra | FileCheck %s<br>

-<br>

-; CHECK: ldrd<br>

-; CHECK: strd<br>

-; CHECK: ldrb<br>

+; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s<br>

<br>

 %struct.x = type { i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8 }<br>

<br>

 @src = external global %struct.x<br>

 @dst = external global %struct.x<br>

<br>

-define i32 @t() {<br>

+@.str1 = private unnamed_addr constant [31 x i8] c"DHRYSTONE PROGRAM, SOME STRING\00", align 1<br>

+@.str2 = private unnamed_addr constant [36 x i8] c"DHRYSTONE PROGRAM, SOME STRING BLAH\00", align 1<br>

+@.str3 = private unnamed_addr constant [24 x i8] c"DHRYSTONE PROGRAM, SOME\00", align 1<br>

+@.str4 = private unnamed_addr constant [18 x i8] c"DHRYSTONE PROGR  \00", align 1<br>

+@.str5 = private unnamed_addr constant [7 x i8] c"DHRYST\00", align 1<br>

+@.str6 = private unnamed_addr constant [14 x i8] c"/tmp/rmXXXXXX\00", align 1<br>

+@spool.splbuf = internal global [512 x i8] zeroinitializer, align 16<br>

+<br>

+define i32 @t0() {<br>

 entry:<br>

+; CHECK: t0:<br>

+; CHECK: vldr [[REG1:d[0-9]+]],<br>

+; CHECK: vstr [[REG1]],<br>

   call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds (%struct.x* @dst, i32 0, i32 0), i8* getelementptr inbounds (%struct.x* @src, i32 0, i32 0), i32 11, i32 8, i1 false)<br>

   ret i32 0<br>

 }<br>

<br>

+define void @t1(i8* nocapture %C) nounwind {<br>

+entry:<br>

+; CHECK: t1:<br>

+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+; CHECK: adds r0, #15<br>

+; CHECK: adds r1, #15<br>

+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([31 x i8]* @.str1, i64 0, i64 0), i64 31, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t2(i8* nocapture %C) nounwind {<br>

+entry:<br>

+; CHECK: t2:<br>

+; CHECK: ldr [[REG2:r[0-9]+]], [r1, #32]<br>

+; CHECK: str [[REG2]], [r0, #32]<br>

+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+; CHECK: adds r0, #16<br>

+; CHECK: adds r1, #16<br>

+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([36 x i8]* @.str2, i64 0, i64 0), i64 36, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t3(i8* nocapture %C) nounwind {<br>

+entry:<br>

+; CHECK: t3:<br>

+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+; CHECK: adds r0, #16<br>

+; CHECK: adds r1, #16<br>

+; CHECK: vld1.8 {d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.8 {d{{[0-9]+}}}, [r0]<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([24 x i8]* @.str3, i64 0, i64 0), i64 24, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t4(i8* nocapture %C) nounwind {<br>

+entry:<br>

+; CHECK: t4:<br>

+; CHECK: vld1.8 {[[REG3:d[0-9]+]], [[REG4:d[0-9]+]]}, [r1]<br>

+; CHECK: vst1.8 {[[REG3]], [[REG4]]}, [r0]<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([18 x i8]* @.str4, i64 0, i64 0), i64 18, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t5(i8* nocapture %C) nounwind {<br>

+entry:<br>

+; CHECK: t5:<br>

+; CHECK: movs [[REG5:r[0-9]+]], #0<br>

+; CHECK: strb [[REG5]], [r0, #6]<br>

+; CHECK: movw [[REG6:r[0-9]+]], #21587<br>

+; CHECK: strh [[REG6]], [r0, #4]<br>

+; CHECK: ldr [[REG7:r[0-9]+]],<br>

+; CHECK: str [[REG7]]<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([7 x i8]* @.str5, i64 0, i64 0), i64 7, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t6() nounwind {<br>

+entry:<br>

+; CHECK: t6:<br>

+; CHECK: vld1.8 {[[REG8:d[0-9]+]]}, [r0]<br>

+; CHECK: vstr [[REG8]], [r1]<br>

+; CHECK: adds r1, #6<br>

+; CHECK: adds r0, #6<br>

+; CHECK: vld1.8<br>

+; CHECK: vst1.16<br>

+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* getelementptr inbounds ([512 x i8]* @spool.splbuf, i64 0, i64 0), i8* getelementptr inbounds ([14 x i8]* @.str6, i64 0, i64 0), i64 14, i32 1, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+%struct.Foo = type { i32, i32, i32, i32 }<br>

+<br>

+define void @t7(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {<br>

+entry:<br>

+; CHECK: t7<br>

+; CHECK: vld1.32<br>

+; CHECK: vst1.32<br>

+  %0 = bitcast %struct.Foo* %a to i8*<br>

+  %1 = bitcast %struct.Foo* %b to i8*<br>

+  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

 declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind<br>

+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind<br>

<br>

Added: llvm/trunk/test/CodeGen/ARM/memset-inline.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memset-inline.ll?rev=169791&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memset-inline.ll?rev=169791&view=auto</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/ARM/memset-inline.ll (added)<br>

+++ llvm/trunk/test/CodeGen/ARM/memset-inline.ll Mon Dec 10 17:21:26 2012<br>

@@ -0,0 +1,30 @@<br>

+; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s<br>

+<br>

+define void @t1(i8* nocapture %c) nounwind optsize {<br>

+entry:<br>

+; CHECK: t1:<br>

+; CHECK: movs r1, #0<br>

+; CHECK: str r1, [r0]<br>

+; CHECK: str r1, [r0, #4]<br>

+; CHECK: str r1, [r0, #8]<br>

+  call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)<br>

+  ret void<br>

+}<br>

+<br>

+define void @t2() nounwind ssp {<br>

+entry:<br>

+; CHECK: t2:<br>

+; CHECK: add.w r1, r0, #10<br>

+; CHECK: vmov.i32 {{q[0-9]+}}, #0x0<br>

+; CHECK: vst1.16 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>

+; CHECK: vst1.32 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>

+  %buf = alloca [26 x i8], align 1<br>

+  %0 = getelementptr inbounds [26 x i8]* %buf, i32 0, i32 0<br>

+  call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)<br>

+  call void @something(i8* %0) nounwind<br>

+  ret void<br>

+}<br>

+<br>

+declare void @something(i8*) nounwind<br>

+declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind<br>

+declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind<br>

<br>

Removed: llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll?rev=169790&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll?rev=169790&view=auto</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (original)<br>

+++ llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (removed)<br>

@@ -1,16 +0,0 @@<br>

-; RUN: llc < %s -march=arm -mcpu=cortex-a8 | FileCheck %s<br>

-; Check that memcpy gets lowered to ldm/stm, at least in this very smple case.<br>

-<br>

-%struct.Foo = type { i32, i32, i32, i32 }<br>

-<br>

-define void @_Z10CopyStructP3FooS0_(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {<br>

-entry:<br>

-;CHECK: ldm<br>

-;CHECK: stm<br>

-  %0 = bitcast %struct.Foo* %a to i8*<br>

-  %1 = bitcast %struct.Foo* %b to i8*<br>

-  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)<br>

-  ret void<br>

-}<br>

-<br>

-declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind<br>

<br>

Modified: llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll (original)<br>

+++ llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll Mon Dec 10 17:21:26 2012<br>

@@ -6,15 +6,16 @@<br>

 define void @t(i32 %count) ssp nounwind {<br>

 entry:<br>

 ; CHECK: t:<br>

-; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip)<br>

-; CHECK: movups L_str(%rip), %xmm0<br>

+; CHECK: movups L_str+12(%rip), %xmm0<br>

+; CHECK: movups L_str(%rip), %xmm1<br>

   %tmp0 = alloca [60 x i8], align 1<br>

   %tmp1 = getelementptr inbounds [60 x i8]* %tmp0, i64 0, i64 0<br>

   br label %bb1<br>

<br>

 bb1:<br>

 ; CHECK: LBB0_1:<br>

-; CHECK: movaps %xmm0, (%rsp)<br>

+; CHECK: movups %xmm0, 12(%rsp)<br>

+; CHECK: movaps %xmm1, (%rsp)<br>

   %tmp2 = phi i32 [ %tmp3, %bb1 ], [ 0, %entry ]<br>

   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* getelementptr inbounds ([28 x i8]* @str, i64 0, i64 0), i64 28, i32 1, i1 false)<br>

   %tmp3 = add i32 %tmp2, 1<br>

<br>

Modified: llvm/trunk/test/CodeGen/X86/memcpy-2.ll<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcpy-2.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcpy-2.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>


==============================================================================<br>

--- llvm/trunk/test/CodeGen/X86/memcpy-2.ll (original)<br>

+++ llvm/trunk/test/CodeGen/X86/memcpy-2.ll Mon Dec 10 17:21:26 2012<br>

@@ -10,18 +10,18 @@<br>

 define void @t1(i32 %argc, i8** %argv) nounwind  {<br>

 entry:<br>

 ; SSE2: t1:<br>

+; SSE2: movsd _.str+16, %xmm0<br>

+; SSE2: movsd %xmm0, 16(%esp)<br>

 ; SSE2: movaps _.str, %xmm0<br>

 ; SSE2: movaps %xmm0<br>

-; SSE2: movb $0<br>

-; SSE2: movl $0<br>

-; SSE2: movl $0<br>

+; SSE2: movb $0, 24(%esp)<br>

<br>

 ; SSE1: t1:<br>

+; SSE1: fldl _.str+16<br>

+; SSE1: fstpl 16(%esp)<br>

 ; SSE1: movaps _.str, %xmm0<br>

 ; SSE1: movaps %xmm0<br>

-; SSE1: movb $0<br>

-; SSE1: movl $0<br>

-; SSE1: movl $0<br>

+; SSE1: movb $0, 24(%esp)<br>

<br>

 ; NOSSE: t1:<br>

 ; NOSSE: movb $0<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br>