Evan,<br><br>I am seeing an assert when I compile a program with llc (the test program, strcat.llvm.mips64el.ll is attached to this email):<br><br>$ llc -march=mips64el -mcpu=mips64r2 -mattr=n64 -disable-mips-delay-filler -filetype=asm -relocation-model=pic -asm-verbose=false -O3 Output/strcat.llvm.mips64el.ll -o Output/strcat.llc.mips64r2.s<br>
<br>llc: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3588: llvm::SDValue getMemcpyLoadsAndStores(llvm::SelectionDAG&, llvm::DebugLoc, llvm::SDValue, llvm::SDValue, llvm::SDValue, uint64_t, unsigned int, bool, bool, llvm::MachinePointerInfo, llvm::MachinePointerInfo): Assertion `i == NumMemOps-1 && i != 0' failed.<br>
<br>The memcpy instruction which is causing assert copies an array of 7 chars to an i8 address.<br>
<br>
(gdb) p I.dump()<br>
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %endptr, i8*
getelementptr inbounds ([7 x i8]* @.str2, i64 0, i64 0), i64 7, i32 1,
i1 false)<br>
<br>I am not familiar with the pieces you touched in this commit, but llc terminates normally if I force the code to execute the else clause here by setting Flag=0 inside MipsTargetLowering::allowsUnalignedMemoryAccesses:<br>
<br>SelectionDAG.cpp:3510<br><br>+ // If the new VT cannot cover all of the remaining bits, then consider<br>
+ // issuing a (or a pair of) unaligned and overlapping load / store.<br>
+ // FIXME: Only does this for 64-bit or more since we don't have proper<br>
+ // cost model for unaligned load / store.<br>
+ bool Fast;<br>
+ if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&<br>
+ TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)<br>+ VTSize = Size;<br>+ else {<br>+ VT = NewVT;<br>+ VTSize = NewVTSize; <br><br>I think overlapping shouldn't be allowed here when the the source size (7B) is smaller than the load size (8B). <br>
<br>Do you have any idea how this can be fixed?<br>
<br><div class="gmail_quote">On Mon, Dec 10, 2012 at 3:21 PM, Evan Cheng <span dir="ltr"><<a href="mailto:evan.cheng@apple.com" target="_blank">evan.cheng@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: evancheng<br>
Date: Mon Dec 10 17:21:26 2012<br>
New Revision: 169791<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=169791&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=169791&view=rev</a><br>
Log:<br>
Some enhancements for memcpy / memset inline expansion.<br>
1. Teach it to use overlapping unaligned load / store to copy / set the trailing<br>
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.<br>
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.<br>
x86 and ARM.<br>
3. When memcpy from a constant string, do *not* replace the load with a constant<br>
if it's not possible to materialize an integer immediate with a single<br>
instruction (required a new target hook: TLI.isIntImmLegal()).<br>
4. Use unaligned load / stores more aggressively if target hooks indicates they<br>
are "fast".<br>
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.<br>
Also increase the threshold to something reasonable (8 for memset, 4 pairs<br>
for memcpy).<br>
<br>
This significantly improves Dhrystone, up to 50% on ARM iOS devices.<br>
<br>
rdar://12760078<br>
<br>
Added:<br>
llvm/trunk/test/CodeGen/ARM/memset-inline.ll<br>
Removed:<br>
llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll<br>
Modified:<br>
llvm/trunk/include/llvm/Target/TargetLowering.h<br>
llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp<br>
llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp<br>
llvm/trunk/lib/Target/ARM/ARMISelLowering.h<br>
llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td<br>
llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp<br>
llvm/trunk/lib/Target/Mips/MipsISelLowering.h<br>
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>
llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll<br>
llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll<br>
llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll<br>
llvm/trunk/test/CodeGen/X86/memcpy-2.ll<br>
<br>
Modified: llvm/trunk/include/llvm/Target/TargetLowering.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/include/llvm/Target/TargetLowering.h (original)<br>
+++ llvm/trunk/include/llvm/Target/TargetLowering.h Mon Dec 10 17:21:26 2012<br>
@@ -371,6 +371,16 @@<br>
return false;<br>
}<br>
<br>
+ /// isIntImmLegal - Returns true if the target can instruction select the<br>
+ /// specified integer immediate natively (that is, it's materialized with one<br>
+ /// instruction). The current *assumption* in isel is all of integer<br>
+ /// immediates are "legal" and only the memcpy / memset expansion code is<br>
+ /// making use of this. The rest of isel doesn't have proper cost model for<br>
+ /// immediate materialization.<br>
+ virtual bool isIntImmLegal(const APInt &/*Imm*/, EVT /*VT*/) const {<br>
+ return true;<br>
+ }<br>
+<br>
/// isShuffleMaskLegal - Targets can use this to indicate that they only<br>
/// support *some* VECTOR_SHUFFLE operations, those with specific masks.<br>
/// By default, if a target supports the VECTOR_SHUFFLE node, all mask values<br>
@@ -678,12 +688,14 @@<br>
}<br>
<br>
/// This function returns true if the target allows unaligned memory accesses.<br>
- /// of the specified type. This is used, for example, in situations where an<br>
- /// array copy/move/set is converted to a sequence of store operations. It's<br>
- /// use helps to ensure that such replacements don't generate code that causes<br>
- /// an alignment error (trap) on the target machine.<br>
+ /// of the specified type. If true, it also returns whether the unaligned<br>
+ /// memory access is "fast" in the second argument by reference. This is used,<br>
+ /// for example, in situations where an array copy/move/set is converted to a<br>
+ /// sequence of store operations. It's use helps to ensure that such<br>
+ /// replacements don't generate code that causes an alignment error (trap) on<br>
+ /// the target machine.<br>
/// @brief Determine if the target supports unaligned memory accesses.<br>
- virtual bool allowsUnalignedMemoryAccesses(EVT) const {<br>
+ virtual bool allowsUnalignedMemoryAccesses(EVT, bool *Fast = 0) const {<br>
return false;<br>
}<br>
<br>
<br>
Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (original)<br>
+++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Mon Dec 10 17:21:26 2012<br>
@@ -3373,7 +3373,7 @@<br>
unsigned NumVTBytes = VT.getSizeInBits() / 8;<br>
unsigned NumBytes = std::min(NumVTBytes, unsigned(Str.size()));<br>
<br>
- uint64_t Val = 0;<br>
+ APInt Val(NumBytes*8, 0);<br>
if (TLI.isLittleEndian()) {<br>
for (unsigned i = 0; i != NumBytes; ++i)<br>
Val |= (uint64_t)(unsigned char)Str[i] << i*8;<br>
@@ -3382,7 +3382,9 @@<br>
Val |= (uint64_t)(unsigned char)Str[i] << (NumVTBytes-i-1)*8;<br>
}<br>
<br>
- return DAG.getConstant(Val, VT);<br>
+ if (TLI.isIntImmLegal(Val, VT))<br>
+ return DAG.getConstant(Val, VT);<br>
+ return SDValue(0, 0);<br>
}<br>
<br>
/// getMemBasePlusOffset - Returns base and offset node for the<br>
@@ -3422,6 +3424,7 @@<br>
unsigned DstAlign, unsigned SrcAlign,<br>
bool IsZeroVal,<br>
bool MemcpyStrSrc,<br>
+ bool AllowOverlap,<br>
SelectionDAG &DAG,<br>
const TargetLowering &TLI) {<br>
assert((SrcAlign == 0 || SrcAlign >= DstAlign) &&<br>
@@ -3461,24 +3464,47 @@<br>
<br>
unsigned NumMemOps = 0;<br>
while (Size != 0) {<br>
+ if (++NumMemOps > Limit)<br>
+ return false;<br>
+<br>
unsigned VTSize = VT.getSizeInBits() / 8;<br>
while (VTSize > Size) {<br>
// For now, only use non-vector load / store's for the left-over pieces.<br>
+ EVT NewVT;<br>
+ unsigned NewVTSize;<br>
if (VT.isVector() || VT.isFloatingPoint()) {<br>
- VT = MVT::i64;<br>
- while (!TLI.isTypeLegal(VT))<br>
- VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>
- VTSize = VT.getSizeInBits() / 8;<br>
+ NewVT = (VT.getSizeInBits() > 64) ? MVT::i64 : MVT::i32;<br>
+ while (!TLI.isOperationLegalOrCustom(ISD::STORE, NewVT)) {<br>
+ if (NewVT == MVT::i64 &&<br>
+ TLI.isOperationLegalOrCustom(ISD::STORE, MVT::f64)) {<br>
+ // i64 is usually not legal on 32-bit targets, but f64 may be.<br>
+ NewVT = MVT::f64;<br>
+ break;<br>
+ }<br>
+ NewVT = (MVT::SimpleValueType)(NewVT.getSimpleVT().SimpleTy - 1);<br>
+ }<br>
+ NewVTSize = NewVT.getSizeInBits() / 8;<br>
} else {<br>
// This can result in a type that is not legal on the target, e.g.<br>
// 1 or 2 bytes on PPC.<br>
- VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>
- VTSize >>= 1;<br>
+ NewVT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);<br>
+ NewVTSize = VTSize >> 1;<br>
+ }<br>
+<br>
+ // If the new VT cannot cover all of the remaining bits, then consider<br>
+ // issuing a (or a pair of) unaligned and overlapping load / store.<br>
+ // FIXME: Only does this for 64-bit or more since we don't have proper<br>
+ // cost model for unaligned load / store.<br>
+ bool Fast;<br>
+ if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&<br>
+ TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)<br>
+ VTSize = Size;<br>
+ else {<br>
+ VT = NewVT;<br>
+ VTSize = NewVTSize;<br>
}<br>
}<br>
<br>
- if (++NumMemOps > Limit)<br>
- return false;<br>
MemOps.push_back(VT);<br>
Size -= VTSize;<br>
}<br>
@@ -3523,7 +3549,7 @@<br>
if (!FindOptimalMemOpLowering(MemOps, Limit, Size,<br>
(DstAlignCanChange ? 0 : Align),<br>
(isZeroStr ? 0 : SrcAlign),<br>
- true, CopyFromStr, DAG, TLI))<br>
+ true, CopyFromStr, true, DAG, TLI))<br>
return SDValue();<br>
<br>
if (DstAlignCanChange) {<br>
@@ -3545,6 +3571,14 @@<br>
unsigned VTSize = VT.getSizeInBits() / 8;<br>
SDValue Value, Store;<br>
<br>
+ if (VTSize > Size) {<br>
+ // Issuing an unaligned load / store pair that overlaps with the previous<br>
+ // pair. Adjust the offset accordingly.<br>
+ assert(i == NumMemOps-1 && i != 0);<br>
+ SrcOff -= VTSize - Size;<br>
+ DstOff -= VTSize - Size;<br>
+ }<br>
+<br>
if (CopyFromStr &&<br>
(isZeroStr || (VT.isInteger() && !VT.isVector()))) {<br>
// It's unlikely a store of a vector immediate can be done in a single<br>
@@ -3553,11 +3587,14 @@<br>
// FIXME: Handle other cases where store of vector immediate is done in<br>
// a single instruction.<br>
Value = getMemsetStringVal(VT, dl, DAG, TLI, Str.substr(SrcOff));<br>
- Store = DAG.getStore(Chain, dl, Value,<br>
- getMemBasePlusOffset(Dst, DstOff, DAG),<br>
- DstPtrInfo.getWithOffset(DstOff), isVol,<br>
- false, Align);<br>
- } else {<br>
+ if (Value.getNode())<br>
+ Store = DAG.getStore(Chain, dl, Value,<br>
+ getMemBasePlusOffset(Dst, DstOff, DAG),<br>
+ DstPtrInfo.getWithOffset(DstOff), isVol,<br>
+ false, Align);<br>
+ }<br>
+<br>
+ if (!Store.getNode()) {<br>
// The type might not be legal for the target. This should only happen<br>
// if the type is smaller than a legal type, as on PPC, so the right<br>
// thing to do is generate a LoadExt/StoreTrunc pair. These simplify<br>
@@ -3577,6 +3614,7 @@<br>
OutChains.push_back(Store);<br>
SrcOff += VTSize;<br>
DstOff += VTSize;<br>
+ Size -= VTSize;<br>
}<br>
<br>
return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,<br>
@@ -3613,7 +3651,7 @@<br>
<br>
if (!FindOptimalMemOpLowering(MemOps, Limit, Size,<br>
(DstAlignCanChange ? 0 : Align),<br>
- SrcAlign, true, false, DAG, TLI))<br>
+ SrcAlign, true, false, false, DAG, TLI))<br>
return SDValue();<br>
<br>
if (DstAlignCanChange) {<br>
@@ -3689,7 +3727,7 @@<br>
isa<ConstantSDNode>(Src) && cast<ConstantSDNode>(Src)->isNullValue();<br>
if (!FindOptimalMemOpLowering(MemOps, TLI.getMaxStoresPerMemset(OptSize),<br>
Size, (DstAlignCanChange ? 0 : Align), 0,<br>
- IsZeroVal, false, DAG, TLI))<br>
+ IsZeroVal, false, true, DAG, TLI))<br>
return SDValue();<br>
<br>
if (DstAlignCanChange) {<br>
@@ -3716,6 +3754,13 @@<br>
<br>
for (unsigned i = 0; i < NumMemOps; i++) {<br>
EVT VT = MemOps[i];<br>
+ unsigned VTSize = VT.getSizeInBits() / 8;<br>
+ if (VTSize > Size) {<br>
+ // Issuing an unaligned load / store pair that overlaps with the previous<br>
+ // pair. Adjust the offset accordingly.<br>
+ assert(i == NumMemOps-1 && i != 0);<br>
+ DstOff -= VTSize - Size;<br>
+ }<br>
<br>
// If this store is smaller than the largest store see whether we can get<br>
// the smaller value for free with a truncate.<br>
@@ -3734,6 +3779,7 @@<br>
isVol, false, Align);<br>
OutChains.push_back(Store);<br>
DstOff += VT.getSizeInBits() / 8;<br>
+ Size -= VTSize;<br>
}<br>
<br>
return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,<br>
<br>
Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)<br>
+++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Mon Dec 10 17:21:26 2012<br>
@@ -833,9 +833,12 @@<br>
setSchedulingPreference(Sched::Hybrid);<br>
<br>
//// temporary - rewrite interface to use type<br>
- maxStoresPerMemcpy = maxStoresPerMemcpyOptSize = 1;<br>
- maxStoresPerMemset = 16;<br>
+ maxStoresPerMemset = 8;<br>
maxStoresPerMemsetOptSize = Subtarget->isTargetDarwin() ? 8 : 4;<br>
+ maxStoresPerMemcpy = 4; // For @llvm.memcpy -> sequence of stores<br>
+ maxStoresPerMemcpyOptSize = Subtarget->isTargetDarwin() ? 4 : 2;<br>
+ maxStoresPerMemmove = 4; // For @llvm.memmove -> sequence of stores<br>
+ maxStoresPerMemmoveOptSize = Subtarget->isTargetDarwin() ? 4 : 2;<br>
<br>
// On ARM arguments smaller than 4 bytes are extended, so all arguments<br>
// are at least 4 bytes aligned.<br>
@@ -9406,7 +9409,7 @@<br>
return (VT == MVT::f32) && (Opc == ISD::LOAD || Opc == ISD::STORE);<br>
}<br>
<br>
-bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {<br>
+bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>
// The AllowsUnaliged flag models the SCTLR.A setting in ARM cpus<br>
bool AllowsUnaligned = Subtarget->allowsUnalignedMem();<br>
<br>
@@ -9415,15 +9418,27 @@<br>
return false;<br>
case MVT::i8:<br>
case MVT::i16:<br>
- case MVT::i32:<br>
+ case MVT::i32: {<br>
// Unaligned access can use (for example) LRDB, LRDH, LDR<br>
- return AllowsUnaligned;<br>
+ if (AllowsUnaligned) {<br>
+ if (Fast)<br>
+ *Fast = Subtarget->hasV7Ops();<br>
+ return true;<br>
+ }<br>
+ return false;<br>
+ }<br>
case MVT::f64:<br>
- case MVT::v2f64:<br>
+ case MVT::v2f64: {<br>
// For any little-endian targets with neon, we can support unaligned ld/st<br>
// of D and Q (e.g. {D0,D1}) registers by using vld1.i8/vst1.i8.<br>
// A big-endian target may also explictly support unaligned accesses<br>
- return Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian());<br>
+ if (Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian())) {<br>
+ if (Fast)<br>
+ *Fast = true;<br>
+ return true;<br>
+ }<br>
+ return false;<br>
+ }<br>
}<br>
}<br>
<br>
@@ -9442,12 +9457,17 @@<br>
<br>
// See if we can use NEON instructions for this...<br>
if (IsZeroVal &&<br>
- !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat) &&<br>
- Subtarget->hasNEON()) {<br>
- if (memOpAlign(SrcAlign, DstAlign, 16) && Size >= 16) {<br>
- return MVT::v4i32;<br>
- } else if (memOpAlign(SrcAlign, DstAlign, 8) && Size >= 8) {<br>
- return MVT::v2i32;<br>
+ Subtarget->hasNEON() &&<br>
+ !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat)) {<br>
+ bool Fast;<br>
+ if (Size >= 16 && (memOpAlign(SrcAlign, DstAlign, 16) ||<br>
+ (allowsUnalignedMemoryAccesses(MVT::v2f64, &Fast) &&<br>
+ Fast))) {<br>
+ return MVT::v2f64;<br>
+ } else if (Size >= 8 && (memOpAlign(SrcAlign, DstAlign, 8) ||<br>
+ (allowsUnalignedMemoryAccesses(MVT::f64, &Fast) &&<br>
+ Fast))) {<br>
+ return MVT::f64;<br>
}<br>
}<br>
<br>
@@ -10241,6 +10261,24 @@<br>
return false;<br>
}<br>
<br>
+bool ARMTargetLowering::isIntImmLegal(const APInt &Imm, EVT VT) const {<br>
+ if (VT.getSizeInBits() > 32)<br>
+ return false;<br>
+<br>
+ int32_t ImmVal = Imm.getSExtValue();<br>
+ if (!Subtarget->isThumb()) {<br>
+ return (ImmVal >= 0 && ImmVal < 65536) ||<br>
+ (ARM_AM::getSOImmVal(ImmVal) != -1) ||<br>
+ (ARM_AM::getSOImmVal(~ImmVal) != -1);<br>
+ } else if (Subtarget->isThumb2()) {<br>
+ return (ImmVal >= 0 && ImmVal < 65536) ||<br>
+ (ARM_AM::getT2SOImmVal(ImmVal) != -1) ||<br>
+ (ARM_AM::getT2SOImmVal(~ImmVal) != -1);<br>
+ } else /*Thumb1*/ {<br>
+ return (ImmVal >= 0 && ImmVal < 256);<br>
+ }<br>
+}<br>
+<br>
/// getTgtMemIntrinsic - Represent NEON load and store intrinsics as<br>
/// MemIntrinsicNodes. The associated MachineMemOperands record the alignment<br>
/// specified in the intrinsic calls.<br>
<br>
Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/ARM/ARMISelLowering.h (original)<br>
+++ llvm/trunk/lib/Target/ARM/ARMISelLowering.h Mon Dec 10 17:21:26 2012<br>
@@ -285,8 +285,9 @@<br>
bool isDesirableToTransformToIntegerOp(unsigned Opc, EVT VT) const;<br>
<br>
/// allowsUnalignedMemoryAccesses - Returns true if the target allows<br>
- /// unaligned memory accesses. of the specified type.<br>
- virtual bool allowsUnalignedMemoryAccesses(EVT VT) const;<br>
+ /// unaligned memory accesses of the specified type. Returns whether it<br>
+ /// is "fast" by reference in the second argument.<br>
+ virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;<br>
<br>
virtual EVT getOptimalMemOpType(uint64_t Size,<br>
unsigned DstAlign, unsigned SrcAlign,<br>
@@ -386,6 +387,8 @@<br>
/// materialize the FP immediate as a load from a constant pool.<br>
virtual bool isFPImmLegal(const APFloat &Imm, EVT VT) const;<br>
<br>
+ virtual bool isIntImmLegal(const APInt &Imm, EVT VT) const;<br>
+<br>
virtual bool getTgtMemIntrinsic(IntrinsicInfo &Info,<br>
const CallInst &I,<br>
unsigned Intrinsic) const;<br>
<br>
Modified: llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td (original)<br>
+++ llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td Mon Dec 10 17:21:26 2012<br>
@@ -2315,13 +2315,15 @@<br>
/// changed to modify CPSR.<br>
multiclass T2I_un_irs<bits<4> opcod, string opc,<br>
InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,<br>
- PatFrag opnode, bit Cheap = 0, bit ReMat = 0> {<br>
+ PatFrag opnode,<br>
+ bit Cheap = 0, bit ReMat = 0, bit MoveImm = 0> {<br>
// shifted imm<br>
def i : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), iii,<br>
opc, "\t$Rd, $imm",<br>
[(set rGPR:$Rd, (opnode t2_so_imm:$imm))]> {<br>
let isAsCheapAsAMove = Cheap;<br>
let isReMaterializable = ReMat;<br>
+ let isMoveImm = MoveImm;<br>
let Inst{31-27} = 0b11110;<br>
let Inst{25} = 0;<br>
let Inst{24-21} = opcod;<br>
@@ -2355,7 +2357,7 @@<br>
let AddedComplexity = 1 in<br>
defm t2MVN : T2I_un_irs <0b0011, "mvn",<br>
IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,<br>
- UnOpFrag<(not node:$Src)>, 1, 1>;<br>
+ UnOpFrag<(not node:$Src)>, 1, 1, 1>;<br>
<br>
let AddedComplexity = 1 in<br>
def : T2Pat<(and rGPR:$src, t2_so_imm_not:$imm),<br>
<br>
Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp (original)<br>
+++ llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp Mon Dec 10 17:21:26 2012<br>
@@ -457,7 +457,8 @@<br>
maxStoresPerMemcpy = 16;<br>
}<br>
<br>
-bool MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {<br>
+bool<br>
+MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>
MVT::SimpleValueType SVT = VT.getSimpleVT().SimpleTy;<br>
<br>
if (Subtarget->inMips16Mode())<br>
@@ -466,6 +467,8 @@<br>
switch (SVT) {<br>
case MVT::i64:<br>
case MVT::i32:<br>
+ if (Fast)<br>
+ *Fast = true;<br>
return true;<br>
default:<br>
return false;<br>
<br>
Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/Mips/MipsISelLowering.h (original)<br>
+++ llvm/trunk/lib/Target/Mips/MipsISelLowering.h Mon Dec 10 17:21:26 2012<br>
@@ -149,7 +149,7 @@<br>
<br>
virtual MVT getShiftAmountTy(EVT LHSTy) const { return MVT::i32; }<br>
<br>
- virtual bool allowsUnalignedMemoryAccesses (EVT VT) const;<br>
+ virtual bool allowsUnalignedMemoryAccesses (EVT VT, bool *Fast) const;<br>
<br>
virtual void LowerOperationWrapper(SDNode *N,<br>
SmallVectorImpl<SDValue> &Results,<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Mon Dec 10 17:21:26 2012<br>
@@ -1412,6 +1412,13 @@<br>
return MVT::i32;<br>
}<br>
<br>
+bool<br>
+X86TargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {<br>
+ if (Fast)<br>
+ *Fast = Subtarget->isUnalignedMemAccessFast();<br>
+ return true;<br>
+}<br>
+<br>
/// getJumpTableEncoding - Return the entry encoding for a jump table in the<br>
/// current function. The returned value is a member of the<br>
/// MachineJumpTableInfo::JTEntryKind enum.<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)<br>
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Mon Dec 10 17:21:26 2012<br>
@@ -507,10 +507,9 @@<br>
MachineFunction &MF) const;<br>
<br>
/// allowsUnalignedMemoryAccesses - Returns true if the target allows<br>
- /// unaligned memory accesses. of the specified type.<br>
- virtual bool allowsUnalignedMemoryAccesses(EVT VT) const {<br>
- return true;<br>
- }<br>
+ /// unaligned memory accesses. of the specified type. Returns whether it<br>
+ /// is "fast" by reference in the second argument.<br>
+ virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;<br>
<br>
/// LowerOperation - Provide custom lowering hooks for some operations.<br>
///<br>
<br>
Modified: llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll (original)<br>
+++ llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll Mon Dec 10 17:21:26 2012<br>
@@ -1,13 +1,5 @@<br>
; RUN: llc -march=arm -mcpu=cortex-a8 < %s | FileCheck %s<br>
<br>
-; Should trigger a NEON store.<br>
-; CHECK: vstr<br>
-define void @f_0_12(i8* nocapture %c) nounwind optsize {<br>
-entry:<br>
- call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)<br>
- ret void<br>
-}<br>
-<br>
; Trigger multiple NEON stores.<br>
; CHECK: vst1.64<br>
; CHECK-NEXT: vst1.64<br>
<br>
Modified: llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll (original)<br>
+++ llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll Mon Dec 10 17:21:26 2012<br>
@@ -1,18 +1,115 @@<br>
-; RUN: llc < %s -mtriple=thumbv7-apple-darwin -disable-post-ra | FileCheck %s<br>
-<br>
-; CHECK: ldrd<br>
-; CHECK: strd<br>
-; CHECK: ldrb<br>
+; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s<br>
<br>
%struct.x = type { i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8 }<br>
<br>
@src = external global %struct.x<br>
@dst = external global %struct.x<br>
<br>
-define i32 @t() {<br>
+@.str1 = private unnamed_addr constant [31 x i8] c"DHRYSTONE PROGRAM, SOME STRING\00", align 1<br>
+@.str2 = private unnamed_addr constant [36 x i8] c"DHRYSTONE PROGRAM, SOME STRING BLAH\00", align 1<br>
+@.str3 = private unnamed_addr constant [24 x i8] c"DHRYSTONE PROGRAM, SOME\00", align 1<br>
+@.str4 = private unnamed_addr constant [18 x i8] c"DHRYSTONE PROGR \00", align 1<br>
+@.str5 = private unnamed_addr constant [7 x i8] c"DHRYST\00", align 1<br>
+@.str6 = private unnamed_addr constant [14 x i8] c"/tmp/rmXXXXXX\00", align 1<br>
+@spool.splbuf = internal global [512 x i8] zeroinitializer, align 16<br>
+<br>
+define i32 @t0() {<br>
entry:<br>
+; CHECK: t0:<br>
+; CHECK: vldr [[REG1:d[0-9]+]],<br>
+; CHECK: vstr [[REG1]],<br>
call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds (%struct.x* @dst, i32 0, i32 0), i8* getelementptr inbounds (%struct.x* @src, i32 0, i32 0), i32 11, i32 8, i1 false)<br>
ret i32 0<br>
}<br>
<br>
+define void @t1(i8* nocapture %C) nounwind {<br>
+entry:<br>
+; CHECK: t1:<br>
+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+; CHECK: adds r0, #15<br>
+; CHECK: adds r1, #15<br>
+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([31 x i8]* @.str1, i64 0, i64 0), i64 31, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t2(i8* nocapture %C) nounwind {<br>
+entry:<br>
+; CHECK: t2:<br>
+; CHECK: ldr [[REG2:r[0-9]+]], [r1, #32]<br>
+; CHECK: str [[REG2]], [r0, #32]<br>
+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+; CHECK: adds r0, #16<br>
+; CHECK: adds r1, #16<br>
+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([36 x i8]* @.str2, i64 0, i64 0), i64 36, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t3(i8* nocapture %C) nounwind {<br>
+entry:<br>
+; CHECK: t3:<br>
+; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+; CHECK: adds r0, #16<br>
+; CHECK: adds r1, #16<br>
+; CHECK: vld1.8 {d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.8 {d{{[0-9]+}}}, [r0]<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([24 x i8]* @.str3, i64 0, i64 0), i64 24, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t4(i8* nocapture %C) nounwind {<br>
+entry:<br>
+; CHECK: t4:<br>
+; CHECK: vld1.8 {[[REG3:d[0-9]+]], [[REG4:d[0-9]+]]}, [r1]<br>
+; CHECK: vst1.8 {[[REG3]], [[REG4]]}, [r0]<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([18 x i8]* @.str4, i64 0, i64 0), i64 18, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t5(i8* nocapture %C) nounwind {<br>
+entry:<br>
+; CHECK: t5:<br>
+; CHECK: movs [[REG5:r[0-9]+]], #0<br>
+; CHECK: strb [[REG5]], [r0, #6]<br>
+; CHECK: movw [[REG6:r[0-9]+]], #21587<br>
+; CHECK: strh [[REG6]], [r0, #4]<br>
+; CHECK: ldr [[REG7:r[0-9]+]],<br>
+; CHECK: str [[REG7]]<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([7 x i8]* @.str5, i64 0, i64 0), i64 7, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t6() nounwind {<br>
+entry:<br>
+; CHECK: t6:<br>
+; CHECK: vld1.8 {[[REG8:d[0-9]+]]}, [r0]<br>
+; CHECK: vstr [[REG8]], [r1]<br>
+; CHECK: adds r1, #6<br>
+; CHECK: adds r0, #6<br>
+; CHECK: vld1.8<br>
+; CHECK: vst1.16<br>
+ call void @llvm.memcpy.p0i8.p0i8.i64(i8* getelementptr inbounds ([512 x i8]* @spool.splbuf, i64 0, i64 0), i8* getelementptr inbounds ([14 x i8]* @.str6, i64 0, i64 0), i64 14, i32 1, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+%struct.Foo = type { i32, i32, i32, i32 }<br>
+<br>
+define void @t7(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {<br>
+entry:<br>
+; CHECK: t7<br>
+; CHECK: vld1.32<br>
+; CHECK: vst1.32<br>
+ %0 = bitcast %struct.Foo* %a to i8*<br>
+ %1 = bitcast %struct.Foo* %b to i8*<br>
+ tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind<br>
+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind<br>
<br>
Added: llvm/trunk/test/CodeGen/ARM/memset-inline.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memset-inline.ll?rev=169791&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memset-inline.ll?rev=169791&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/ARM/memset-inline.ll (added)<br>
+++ llvm/trunk/test/CodeGen/ARM/memset-inline.ll Mon Dec 10 17:21:26 2012<br>
@@ -0,0 +1,30 @@<br>
+; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s<br>
+<br>
+define void @t1(i8* nocapture %c) nounwind optsize {<br>
+entry:<br>
+; CHECK: t1:<br>
+; CHECK: movs r1, #0<br>
+; CHECK: str r1, [r0]<br>
+; CHECK: str r1, [r0, #4]<br>
+; CHECK: str r1, [r0, #8]<br>
+ call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)<br>
+ ret void<br>
+}<br>
+<br>
+define void @t2() nounwind ssp {<br>
+entry:<br>
+; CHECK: t2:<br>
+; CHECK: add.w r1, r0, #10<br>
+; CHECK: vmov.i32 {{q[0-9]+}}, #0x0<br>
+; CHECK: vst1.16 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]<br>
+; CHECK: vst1.32 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]<br>
+ %buf = alloca [26 x i8], align 1<br>
+ %0 = getelementptr inbounds [26 x i8]* %buf, i32 0, i32 0<br>
+ call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)<br>
+ call void @something(i8* %0) nounwind<br>
+ ret void<br>
+}<br>
+<br>
+declare void @something(i8*) nounwind<br>
+declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind<br>
+declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind<br>
<br>
Removed: llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll?rev=169790&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll?rev=169790&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (original)<br>
+++ llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (removed)<br>
@@ -1,16 +0,0 @@<br>
-; RUN: llc < %s -march=arm -mcpu=cortex-a8 | FileCheck %s<br>
-; Check that memcpy gets lowered to ldm/stm, at least in this very smple case.<br>
-<br>
-%struct.Foo = type { i32, i32, i32, i32 }<br>
-<br>
-define void @_Z10CopyStructP3FooS0_(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {<br>
-entry:<br>
-;CHECK: ldm<br>
-;CHECK: stm<br>
- %0 = bitcast %struct.Foo* %a to i8*<br>
- %1 = bitcast %struct.Foo* %b to i8*<br>
- tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)<br>
- ret void<br>
-}<br>
-<br>
-declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll Mon Dec 10 17:21:26 2012<br>
@@ -6,15 +6,16 @@<br>
define void @t(i32 %count) ssp nounwind {<br>
entry:<br>
; CHECK: t:<br>
-; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip)<br>
-; CHECK: movups L_str(%rip), %xmm0<br>
+; CHECK: movups L_str+12(%rip), %xmm0<br>
+; CHECK: movups L_str(%rip), %xmm1<br>
%tmp0 = alloca [60 x i8], align 1<br>
%tmp1 = getelementptr inbounds [60 x i8]* %tmp0, i64 0, i64 0<br>
br label %bb1<br>
<br>
bb1:<br>
; CHECK: LBB0_1:<br>
-; CHECK: movaps %xmm0, (%rsp)<br>
+; CHECK: movups %xmm0, 12(%rsp)<br>
+; CHECK: movaps %xmm1, (%rsp)<br>
%tmp2 = phi i32 [ %tmp3, %bb1 ], [ 0, %entry ]<br>
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* getelementptr inbounds ([28 x i8]* @str, i64 0, i64 0), i64 28, i32 1, i1 false)<br>
%tmp3 = add i32 %tmp2, 1<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/memcpy-2.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcpy-2.ll?rev=169791&r1=169790&r2=169791&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcpy-2.ll?rev=169791&r1=169790&r2=169791&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/X86/memcpy-2.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/memcpy-2.ll Mon Dec 10 17:21:26 2012<br>
@@ -10,18 +10,18 @@<br>
define void @t1(i32 %argc, i8** %argv) nounwind {<br>
entry:<br>
; SSE2: t1:<br>
+; SSE2: movsd _.str+16, %xmm0<br>
+; SSE2: movsd %xmm0, 16(%esp)<br>
; SSE2: movaps _.str, %xmm0<br>
; SSE2: movaps %xmm0<br>
-; SSE2: movb $0<br>
-; SSE2: movl $0<br>
-; SSE2: movl $0<br>
+; SSE2: movb $0, 24(%esp)<br>
<br>
; SSE1: t1:<br>
+; SSE1: fldl _.str+16<br>
+; SSE1: fstpl 16(%esp)<br>
; SSE1: movaps _.str, %xmm0<br>
; SSE1: movaps %xmm0<br>
-; SSE1: movb $0<br>
-; SSE1: movl $0<br>
-; SSE1: movl $0<br>
+; SSE1: movb $0, 24(%esp)<br>
<br>
; NOSSE: t1:<br>
; NOSSE: movb $0<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br>