[llvm-commits] [llvm] r169791 - in /llvm/trunk: include/llvm/Target/ lib/CodeGen/SelectionDAG/ lib/Target/ARM/ lib/Target/Mips/ lib/Target/X86/ test/CodeGen/ARM/ test/CodeGen/X86/

Wed Dec 12 12:44:33 PST 2012

r170018

Evan

On Dec 12, 2012, at 12:04 PM, Evan Cheng <evan.cheng at apple.com> wrote:

> I'm taking a look.
> 
> Evan
> 
> On Dec 12, 2012, at 11:58 AM, Akira Hatanaka <ahatanak at gmail.com> wrote:
> 
>> Evan,
>> 
>> I am seeing an assert when I compile a program with llc (the test program, strcat.llvm.mips64el.ll is attached to this email):
>> 
>> $ llc -march=mips64el -mcpu=mips64r2 -mattr=n64   -disable-mips-delay-filler -filetype=asm -relocation-model=pic    -asm-verbose=false -O3  Output/strcat.llvm.mips64el.ll -o Output/strcat.llc.mips64r2.s
>> 
>> llc: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3588: llvm::SDValue getMemcpyLoadsAndStores(llvm::SelectionDAG&, llvm::DebugLoc, llvm::SDValue, llvm::SDValue, llvm::SDValue, uint64_t, unsigned int, bool, bool, llvm::MachinePointerInfo, llvm::MachinePointerInfo): Assertion `i == NumMemOps-1 && i != 0' failed.
>> 
>> The memcpy instruction which is causing assert copies an array of 7 chars to an i8 address.
>> 
>> (gdb) p I.dump()
>>   tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %endptr, i8* getelementptr inbounds ([7 x i8]* @.str2, i64 0, i64 0), i64 7, i32 1, i1 false)
>> 
>> I am not familiar with the pieces you touched in this commit, but llc terminates normally if I force the code to execute the else clause here by setting Flag=0 inside MipsTargetLowering::allowsUnalignedMemoryAccesses:
>> 
>> SelectionDAG.cpp:3510
>> 
>> +      // If the new VT cannot cover all of the remaining bits, then consider
>> +      // issuing a (or a pair of) unaligned and overlapping load / store.
>> +      // FIXME: Only does this for 64-bit or more since we don't have proper
>> +      // cost model for unaligned load / store.
>> +      bool Fast;
>> +      if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&
>> +          TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)
>> +        VTSize = Size;
>> +      else {
>> +        VT = NewVT;
>> +        VTSize = NewVTSize;       
>> 
>> I think overlapping shouldn't be allowed here when the the source size (7B) is smaller than the load size (8B). 
>> 
>> Do you have any idea how this can be fixed?
>> 
>> On Mon, Dec 10, 2012 at 3:21 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>> Author: evancheng
>> Date: Mon Dec 10 17:21:26 2012
>> New Revision: 169791
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=169791&view=rev
>> Log:
>> Some enhancements for memcpy / memset inline expansion.
>> 1. Teach it to use overlapping unaligned load / store to copy / set the trailing
>>    bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
>> 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
>>    x86 and ARM.
>> 3. When memcpy from a constant string, do *not* replace the load with a constant
>>    if it's not possible to materialize an integer immediate with a single
>>    instruction (required a new target hook: TLI.isIntImmLegal()).
>> 4. Use unaligned load / stores more aggressively if target hooks indicates they
>>    are "fast".
>> 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
>>    Also increase the threshold to something reasonable (8 for memset, 4 pairs
>>    for memcpy).
>> 
>> This significantly improves Dhrystone, up to 50% on ARM iOS devices.
>> 
>> rdar://12760078
>> 
>> Added:
>>     llvm/trunk/test/CodeGen/ARM/memset-inline.ll
>> Removed:
>>     llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll
>> Modified:
>>     llvm/trunk/include/llvm/Target/TargetLowering.h
>>     llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>     llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
>>     llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>>     llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td
>>     llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp
>>     llvm/trunk/lib/Target/Mips/MipsISelLowering.h
>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>     llvm/trunk/lib/Target/X86/X86ISelLowering.h
>>     llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll
>>     llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll
>>     llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll
>>     llvm/trunk/test/CodeGen/X86/memcpy-2.ll
>> 
>> Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/include/llvm/Target/TargetLowering.h (original)
>> +++ llvm/trunk/include/llvm/Target/TargetLowering.h Mon Dec 10 17:21:26 2012
>> @@ -371,6 +371,16 @@
>>      return false;
>>    }
>> 
>> +  /// isIntImmLegal - Returns true if the target can instruction select the
>> +  /// specified integer immediate natively (that is, it's materialized with one
>> +  /// instruction). The current *assumption* in isel is all of integer
>> +  /// immediates are "legal" and only the memcpy / memset expansion code is
>> +  /// making use of this. The rest of isel doesn't have proper cost model for
>> +  /// immediate materialization.
>> +  virtual bool isIntImmLegal(const APInt &/*Imm*/, EVT /*VT*/) const {
>> +    return true;
>> +  }
>> +
>>    /// isShuffleMaskLegal - Targets can use this to indicate that they only
>>    /// support *some* VECTOR_SHUFFLE operations, those with specific masks.
>>    /// By default, if a target supports the VECTOR_SHUFFLE node, all mask values
>> @@ -678,12 +688,14 @@
>>    }
>> 
>>    /// This function returns true if the target allows unaligned memory accesses.
>> -  /// of the specified type. This is used, for example, in situations where an
>> -  /// array copy/move/set is  converted to a sequence of store operations. It's
>> -  /// use helps to ensure that such replacements don't generate code that causes
>> -  /// an alignment error  (trap) on the target machine.
>> +  /// of the specified type. If true, it also returns whether the unaligned
>> +  /// memory access is "fast" in the second argument by reference. This is used,
>> +  /// for example, in situations where an array copy/move/set is  converted to a
>> +  /// sequence of store operations. It's use helps to ensure that such
>> +  /// replacements don't generate code that causes an alignment error  (trap) on
>> +  /// the target machine.
>>    /// @brief Determine if the target supports unaligned memory accesses.
>> -  virtual bool allowsUnalignedMemoryAccesses(EVT) const {
>> +  virtual bool allowsUnalignedMemoryAccesses(EVT, bool *Fast = 0) const {
>>      return false;
>>    }
>> 
>> 
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Mon Dec 10 17:21:26 2012
>> @@ -3373,7 +3373,7 @@
>>    unsigned NumVTBytes = VT.getSizeInBits() / 8;
>>    unsigned NumBytes = std::min(NumVTBytes, unsigned(Str.size()));
>> 
>> -  uint64_t Val = 0;
>> +  APInt Val(NumBytes*8, 0);
>>    if (TLI.isLittleEndian()) {
>>      for (unsigned i = 0; i != NumBytes; ++i)
>>        Val |= (uint64_t)(unsigned char)Str[i] << i*8;
>> @@ -3382,7 +3382,9 @@
>>        Val |= (uint64_t)(unsigned char)Str[i] << (NumVTBytes-i-1)*8;
>>    }
>> 
>> -  return DAG.getConstant(Val, VT);
>> +  if (TLI.isIntImmLegal(Val, VT))
>> +    return DAG.getConstant(Val, VT);
>> +  return SDValue(0, 0);
>>  }
>> 
>>  /// getMemBasePlusOffset - Returns base and offset node for the
>> @@ -3422,6 +3424,7 @@
>>                                       unsigned DstAlign, unsigned SrcAlign,
>>                                       bool IsZeroVal,
>>                                       bool MemcpyStrSrc,
>> +                                     bool AllowOverlap,
>>                                       SelectionDAG &DAG,
>>                                       const TargetLowering &TLI) {
>>    assert((SrcAlign == 0 || SrcAlign >= DstAlign) &&
>> @@ -3461,24 +3464,47 @@
>> 
>>    unsigned NumMemOps = 0;
>>    while (Size != 0) {
>> +    if (++NumMemOps > Limit)
>> +      return false;
>> +
>>      unsigned VTSize = VT.getSizeInBits() / 8;
>>      while (VTSize > Size) {
>>        // For now, only use non-vector load / store's for the left-over pieces.
>> +      EVT NewVT;
>> +      unsigned NewVTSize;
>>        if (VT.isVector() || VT.isFloatingPoint()) {
>> -        VT = MVT::i64;
>> -        while (!TLI.isTypeLegal(VT))
>> -          VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);
>> -        VTSize = VT.getSizeInBits() / 8;
>> +        NewVT = (VT.getSizeInBits() > 64) ? MVT::i64 : MVT::i32;
>> +        while (!TLI.isOperationLegalOrCustom(ISD::STORE, NewVT)) {
>> +          if (NewVT == MVT::i64 &&
>> +              TLI.isOperationLegalOrCustom(ISD::STORE, MVT::f64)) {
>> +            // i64 is usually not legal on 32-bit targets, but f64 may be.
>> +            NewVT = MVT::f64;
>> +            break;
>> +          }
>> +          NewVT = (MVT::SimpleValueType)(NewVT.getSimpleVT().SimpleTy - 1);
>> +        }
>> +        NewVTSize = NewVT.getSizeInBits() / 8;
>>        } else {
>>          // This can result in a type that is not legal on the target, e.g.
>>          // 1 or 2 bytes on PPC.
>> -        VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);
>> -        VTSize >>= 1;
>> +        NewVT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);
>> +        NewVTSize = VTSize >> 1;
>> +      }
>> +
>> +      // If the new VT cannot cover all of the remaining bits, then consider
>> +      // issuing a (or a pair of) unaligned and overlapping load / store.
>> +      // FIXME: Only does this for 64-bit or more since we don't have proper
>> +      // cost model for unaligned load / store.
>> +      bool Fast;
>> +      if (AllowOverlap && VTSize >= 8 && NewVTSize < Size &&
>> +          TLI.allowsUnalignedMemoryAccesses(VT, &Fast) && Fast)
>> +        VTSize = Size;
>> +      else {
>> +        VT = NewVT;
>> +        VTSize = NewVTSize;
>>        }
>>      }
>> 
>> -    if (++NumMemOps > Limit)
>> -      return false;
>>      MemOps.push_back(VT);
>>      Size -= VTSize;
>>    }
>> @@ -3523,7 +3549,7 @@
>>    if (!FindOptimalMemOpLowering(MemOps, Limit, Size,
>>                                  (DstAlignCanChange ? 0 : Align),
>>                                  (isZeroStr ? 0 : SrcAlign),
>> -                                true, CopyFromStr, DAG, TLI))
>> +                                true, CopyFromStr, true, DAG, TLI))
>>      return SDValue();
>> 
>>    if (DstAlignCanChange) {
>> @@ -3545,6 +3571,14 @@
>>      unsigned VTSize = VT.getSizeInBits() / 8;
>>      SDValue Value, Store;
>> 
>> +    if (VTSize > Size) {
>> +      // Issuing an unaligned load / store pair  that overlaps with the previous
>> +      // pair. Adjust the offset accordingly.
>> +      assert(i == NumMemOps-1 && i != 0);
>> +      SrcOff -= VTSize - Size;
>> +      DstOff -= VTSize - Size;
>> +    }
>> +
>>      if (CopyFromStr &&
>>          (isZeroStr || (VT.isInteger() && !VT.isVector()))) {
>>        // It's unlikely a store of a vector immediate can be done in a single
>> @@ -3553,11 +3587,14 @@
>>        // FIXME: Handle other cases where store of vector immediate is done in
>>        // a single instruction.
>>        Value = getMemsetStringVal(VT, dl, DAG, TLI, Str.substr(SrcOff));
>> -      Store = DAG.getStore(Chain, dl, Value,
>> -                           getMemBasePlusOffset(Dst, DstOff, DAG),
>> -                           DstPtrInfo.getWithOffset(DstOff), isVol,
>> -                           false, Align);
>> -    } else {
>> +      if (Value.getNode())
>> +        Store = DAG.getStore(Chain, dl, Value,
>> +                             getMemBasePlusOffset(Dst, DstOff, DAG),
>> +                             DstPtrInfo.getWithOffset(DstOff), isVol,
>> +                             false, Align);
>> +    }
>> +
>> +    if (!Store.getNode()) {
>>        // The type might not be legal for the target.  This should only happen
>>        // if the type is smaller than a legal type, as on PPC, so the right
>>        // thing to do is generate a LoadExt/StoreTrunc pair.  These simplify
>> @@ -3577,6 +3614,7 @@
>>      OutChains.push_back(Store);
>>      SrcOff += VTSize;
>>      DstOff += VTSize;
>> +    Size -= VTSize;
>>    }
>> 
>>    return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
>> @@ -3613,7 +3651,7 @@
>> 
>>    if (!FindOptimalMemOpLowering(MemOps, Limit, Size,
>>                                  (DstAlignCanChange ? 0 : Align),
>> -                                SrcAlign, true, false, DAG, TLI))
>> +                                SrcAlign, true, false, false, DAG, TLI))
>>      return SDValue();
>> 
>>    if (DstAlignCanChange) {
>> @@ -3689,7 +3727,7 @@
>>      isa<ConstantSDNode>(Src) && cast<ConstantSDNode>(Src)->isNullValue();
>>    if (!FindOptimalMemOpLowering(MemOps, TLI.getMaxStoresPerMemset(OptSize),
>>                                  Size, (DstAlignCanChange ? 0 : Align), 0,
>> -                                IsZeroVal, false, DAG, TLI))
>> +                                IsZeroVal, false, true, DAG, TLI))
>>      return SDValue();
>> 
>>    if (DstAlignCanChange) {
>> @@ -3716,6 +3754,13 @@
>> 
>>    for (unsigned i = 0; i < NumMemOps; i++) {
>>      EVT VT = MemOps[i];
>> +    unsigned VTSize = VT.getSizeInBits() / 8;
>> +    if (VTSize > Size) {
>> +      // Issuing an unaligned load / store pair  that overlaps with the previous
>> +      // pair. Adjust the offset accordingly.
>> +      assert(i == NumMemOps-1 && i != 0);
>> +      DstOff -= VTSize - Size;
>> +    }
>> 
>>      // If this store is smaller than the largest store see whether we can get
>>      // the smaller value for free with a truncate.
>> @@ -3734,6 +3779,7 @@
>>                                   isVol, false, Align);
>>      OutChains.push_back(Store);
>>      DstOff += VT.getSizeInBits() / 8;
>> +    Size -= VTSize;
>>    }
>> 
>>    return DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
>> 
>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Mon Dec 10 17:21:26 2012
>> @@ -833,9 +833,12 @@
>>      setSchedulingPreference(Sched::Hybrid);
>> 
>>    //// temporary - rewrite interface to use type
>> -  maxStoresPerMemcpy = maxStoresPerMemcpyOptSize = 1;
>> -  maxStoresPerMemset = 16;
>> +  maxStoresPerMemset = 8;
>>    maxStoresPerMemsetOptSize = Subtarget->isTargetDarwin() ? 8 : 4;
>> +  maxStoresPerMemcpy = 4; // For @llvm.memcpy -> sequence of stores
>> +  maxStoresPerMemcpyOptSize = Subtarget->isTargetDarwin() ? 4 : 2;
>> +  maxStoresPerMemmove = 4; // For @llvm.memmove -> sequence of stores
>> +  maxStoresPerMemmoveOptSize = Subtarget->isTargetDarwin() ? 4 : 2;
>> 
>>    // On ARM arguments smaller than 4 bytes are extended, so all arguments
>>    // are at least 4 bytes aligned.
>> @@ -9406,7 +9409,7 @@
>>    return (VT == MVT::f32) && (Opc == ISD::LOAD || Opc == ISD::STORE);
>>  }
>> 
>> -bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {
>> +bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {
>>    // The AllowsUnaliged flag models the SCTLR.A setting in ARM cpus
>>    bool AllowsUnaligned = Subtarget->allowsUnalignedMem();
>> 
>> @@ -9415,15 +9418,27 @@
>>      return false;
>>    case MVT::i8:
>>    case MVT::i16:
>> -  case MVT::i32:
>> +  case MVT::i32: {
>>      // Unaligned access can use (for example) LRDB, LRDH, LDR
>> -    return AllowsUnaligned;
>> +    if (AllowsUnaligned) {
>> +      if (Fast)
>> +        *Fast = Subtarget->hasV7Ops();
>> +      return true;
>> +    }
>> +    return false;
>> +  }
>>    case MVT::f64:
>> -  case MVT::v2f64:
>> +  case MVT::v2f64: {
>>      // For any little-endian targets with neon, we can support unaligned ld/st
>>      // of D and Q (e.g. {D0,D1}) registers by using vld1.i8/vst1.i8.
>>      // A big-endian target may also explictly support unaligned accesses
>> -    return Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian());
>> +    if (Subtarget->hasNEON() && (AllowsUnaligned || isLittleEndian())) {
>> +      if (Fast)
>> +        *Fast = true;
>> +      return true;
>> +    }
>> +    return false;
>> +  }
>>    }
>>  }
>> 
>> @@ -9442,12 +9457,17 @@
>> 
>>    // See if we can use NEON instructions for this...
>>    if (IsZeroVal &&
>> -      !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat) &&
>> -      Subtarget->hasNEON()) {
>> -    if (memOpAlign(SrcAlign, DstAlign, 16) && Size >= 16) {
>> -      return MVT::v4i32;
>> -    } else if (memOpAlign(SrcAlign, DstAlign, 8) && Size >= 8) {
>> -      return MVT::v2i32;
>> +      Subtarget->hasNEON() &&
>> +      !F->getFnAttributes().hasAttribute(Attributes::NoImplicitFloat)) {
>> +    bool Fast;
>> +    if (Size >= 16 && (memOpAlign(SrcAlign, DstAlign, 16) ||
>> +                       (allowsUnalignedMemoryAccesses(MVT::v2f64, &Fast) &&
>> +                        Fast))) {
>> +      return MVT::v2f64;
>> +    } else if (Size >= 8 && (memOpAlign(SrcAlign, DstAlign, 8) ||
>> +                             (allowsUnalignedMemoryAccesses(MVT::f64, &Fast) &&
>> +                              Fast))) {
>> +      return MVT::f64;
>>      }
>>    }
>> 
>> @@ -10241,6 +10261,24 @@
>>    return false;
>>  }
>> 
>> +bool ARMTargetLowering::isIntImmLegal(const APInt &Imm, EVT VT) const {
>> +  if (VT.getSizeInBits() > 32)
>> +    return false;
>> +
>> +  int32_t ImmVal = Imm.getSExtValue();
>> +  if (!Subtarget->isThumb()) {
>> +    return (ImmVal >= 0 && ImmVal < 65536) ||
>> +      (ARM_AM::getSOImmVal(ImmVal) != -1) ||
>> +      (ARM_AM::getSOImmVal(~ImmVal) != -1);
>> +  } else if (Subtarget->isThumb2()) {
>> +    return (ImmVal >= 0 && ImmVal < 65536) ||
>> +      (ARM_AM::getT2SOImmVal(ImmVal) != -1) ||
>> +      (ARM_AM::getT2SOImmVal(~ImmVal) != -1);
>> +  } else /*Thumb1*/ {
>> +    return (ImmVal >= 0 && ImmVal < 256);
>> +  }
>> +}
>> +
>>  /// getTgtMemIntrinsic - Represent NEON load and store intrinsics as
>>  /// MemIntrinsicNodes.  The associated MachineMemOperands record the alignment
>>  /// specified in the intrinsic calls.
>> 
>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.h Mon Dec 10 17:21:26 2012
>> @@ -285,8 +285,9 @@
>>      bool isDesirableToTransformToIntegerOp(unsigned Opc, EVT VT) const;
>> 
>>      /// allowsUnalignedMemoryAccesses - Returns true if the target allows
>> -    /// unaligned memory accesses. of the specified type.
>> -    virtual bool allowsUnalignedMemoryAccesses(EVT VT) const;
>> +    /// unaligned memory accesses of the specified type. Returns whether it
>> +    /// is "fast" by reference in the second argument.
>> +    virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;
>> 
>>      virtual EVT getOptimalMemOpType(uint64_t Size,
>>                                      unsigned DstAlign, unsigned SrcAlign,
>> @@ -386,6 +387,8 @@
>>      /// materialize the FP immediate as a load from a constant pool.
>>      virtual bool isFPImmLegal(const APFloat &Imm, EVT VT) const;
>> 
>> +    virtual bool isIntImmLegal(const APInt &Imm, EVT VT) const;
>> +
>>      virtual bool getTgtMemIntrinsic(IntrinsicInfo &Info,
>>                                      const CallInst &I,
>>                                      unsigned Intrinsic) const;
>> 
>> Modified: llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td (original)
>> +++ llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td Mon Dec 10 17:21:26 2012
>> @@ -2315,13 +2315,15 @@
>>  /// changed to modify CPSR.
>>  multiclass T2I_un_irs<bits<4> opcod, string opc,
>>                       InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,
>> -                      PatFrag opnode, bit Cheap = 0, bit ReMat = 0> {
>> +                      PatFrag opnode,
>> +                      bit Cheap = 0, bit ReMat = 0, bit MoveImm = 0> {
>>     // shifted imm
>>     def i : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), iii,
>>                  opc, "\t$Rd, $imm",
>>                  [(set rGPR:$Rd, (opnode t2_so_imm:$imm))]> {
>>       let isAsCheapAsAMove = Cheap;
>>       let isReMaterializable = ReMat;
>> +     let isMoveImm = MoveImm;
>>       let Inst{31-27} = 0b11110;
>>       let Inst{25} = 0;
>>       let Inst{24-21} = opcod;
>> @@ -2355,7 +2357,7 @@
>>  let AddedComplexity = 1 in
>>  defm t2MVN  : T2I_un_irs <0b0011, "mvn",
>>                            IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,
>> -                          UnOpFrag<(not node:$Src)>, 1, 1>;
>> +                          UnOpFrag<(not node:$Src)>, 1, 1, 1>;
>> 
>>  let AddedComplexity = 1 in
>>  def : T2Pat<(and     rGPR:$src, t2_so_imm_not:$imm),
>> 
>> Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/Mips/MipsISelLowering.cpp Mon Dec 10 17:21:26 2012
>> @@ -457,7 +457,8 @@
>>    maxStoresPerMemcpy = 16;
>>  }
>> 
>> -bool MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {
>> +bool
>> +MipsTargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {
>>    MVT::SimpleValueType SVT = VT.getSimpleVT().SimpleTy;
>> 
>>    if (Subtarget->inMips16Mode())
>> @@ -466,6 +467,8 @@
>>    switch (SVT) {
>>    case MVT::i64:
>>    case MVT::i32:
>> +    if (Fast)
>> +      *Fast = true;
>>      return true;
>>    default:
>>      return false;
>> 
>> Modified: llvm/trunk/lib/Target/Mips/MipsISelLowering.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/MipsISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/Mips/MipsISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/Mips/MipsISelLowering.h Mon Dec 10 17:21:26 2012
>> @@ -149,7 +149,7 @@
>> 
>>      virtual MVT getShiftAmountTy(EVT LHSTy) const { return MVT::i32; }
>> 
>> -    virtual bool allowsUnalignedMemoryAccesses (EVT VT) const;
>> +    virtual bool allowsUnalignedMemoryAccesses (EVT VT, bool *Fast) const;
>> 
>>      virtual void LowerOperationWrapper(SDNode *N,
>>                                         SmallVectorImpl<SDValue> &Results,
>> 
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Mon Dec 10 17:21:26 2012
>> @@ -1412,6 +1412,13 @@
>>    return MVT::i32;
>>  }
>> 
>> +bool
>> +X86TargetLowering::allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const {
>> +  if (Fast)
>> +    *Fast = Subtarget->isUnalignedMemAccessFast();
>> +  return true;
>> +}
>> +
>>  /// getJumpTableEncoding - Return the entry encoding for a jump table in the
>>  /// current function.  The returned value is a member of the
>>  /// MachineJumpTableInfo::JTEntryKind enum.
>> 
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Mon Dec 10 17:21:26 2012
>> @@ -507,10 +507,9 @@
>>                          MachineFunction &MF) const;
>> 
>>      /// allowsUnalignedMemoryAccesses - Returns true if the target allows
>> -    /// unaligned memory accesses. of the specified type.
>> -    virtual bool allowsUnalignedMemoryAccesses(EVT VT) const {
>> -      return true;
>> -    }
>> +    /// unaligned memory accesses. of the specified type. Returns whether it
>> +    /// is "fast" by reference in the second argument.
>> +    virtual bool allowsUnalignedMemoryAccesses(EVT VT, bool *Fast) const;
>> 
>>      /// LowerOperation - Provide custom lowering hooks for some operations.
>>      ///
>> 
>> Modified: llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll (original)
>> +++ llvm/trunk/test/CodeGen/ARM/2011-10-26-memset-with-neon.ll Mon Dec 10 17:21:26 2012
>> @@ -1,13 +1,5 @@
>>  ; RUN: llc -march=arm -mcpu=cortex-a8 < %s | FileCheck %s
>> 
>> -; Should trigger a NEON store.
>> -; CHECK: vstr
>> -define void @f_0_12(i8* nocapture %c) nounwind optsize {
>> -entry:
>> -  call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)
>> -  ret void
>> -}
>> -
>>  ; Trigger multiple NEON stores.
>>  ; CHECK:      vst1.64
>>  ; CHECK-NEXT: vst1.64
>> 
>> Modified: llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll (original)
>> +++ llvm/trunk/test/CodeGen/ARM/memcpy-inline.ll Mon Dec 10 17:21:26 2012
>> @@ -1,18 +1,115 @@
>> -; RUN: llc < %s -mtriple=thumbv7-apple-darwin -disable-post-ra | FileCheck %s
>> -
>> -; CHECK: ldrd
>> -; CHECK: strd
>> -; CHECK: ldrb
>> +; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s
>> 
>>  %struct.x = type { i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8 }
>> 
>>  @src = external global %struct.x
>>  @dst = external global %struct.x
>> 
>> -define i32 @t() {
>> + at .str1 = private unnamed_addr constant [31 x i8] c"DHRYSTONE PROGRAM, SOME STRING\00", align 1
>> + at .str2 = private unnamed_addr constant [36 x i8] c"DHRYSTONE PROGRAM, SOME STRING BLAH\00", align 1
>> + at .str3 = private unnamed_addr constant [24 x i8] c"DHRYSTONE PROGRAM, SOME\00", align 1
>> + at .str4 = private unnamed_addr constant [18 x i8] c"DHRYSTONE PROGR  \00", align 1
>> + at .str5 = private unnamed_addr constant [7 x i8] c"DHRYST\00", align 1
>> + at .str6 = private unnamed_addr constant [14 x i8] c"/tmp/rmXXXXXX\00", align 1
>> + at spool.splbuf = internal global [512 x i8] zeroinitializer, align 16
>> +
>> +define i32 @t0() {
>>  entry:
>> +; CHECK: t0:
>> +; CHECK: vldr [[REG1:d[0-9]+]],
>> +; CHECK: vstr [[REG1]],
>>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds (%struct.x* @dst, i32 0, i32 0), i8* getelementptr inbounds (%struct.x* @src, i32 0, i32 0), i32 11, i32 8, i1 false)
>>    ret i32 0
>>  }
>> 
>> +define void @t1(i8* nocapture %C) nounwind {
>> +entry:
>> +; CHECK: t1:
>> +; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +; CHECK: adds r0, #15
>> +; CHECK: adds r1, #15
>> +; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([31 x i8]* @.str1, i64 0, i64 0), i64 31, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t2(i8* nocapture %C) nounwind {
>> +entry:
>> +; CHECK: t2:
>> +; CHECK: ldr [[REG2:r[0-9]+]], [r1, #32]
>> +; CHECK: str [[REG2]], [r0, #32]
>> +; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +; CHECK: adds r0, #16
>> +; CHECK: adds r1, #16
>> +; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([36 x i8]* @.str2, i64 0, i64 0), i64 36, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t3(i8* nocapture %C) nounwind {
>> +entry:
>> +; CHECK: t3:
>> +; CHECK: vld1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +; CHECK: adds r0, #16
>> +; CHECK: adds r1, #16
>> +; CHECK: vld1.8 {d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.8 {d{{[0-9]+}}}, [r0]
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([24 x i8]* @.str3, i64 0, i64 0), i64 24, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t4(i8* nocapture %C) nounwind {
>> +entry:
>> +; CHECK: t4:
>> +; CHECK: vld1.8 {[[REG3:d[0-9]+]], [[REG4:d[0-9]+]]}, [r1]
>> +; CHECK: vst1.8 {[[REG3]], [[REG4]]}, [r0]
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([18 x i8]* @.str4, i64 0, i64 0), i64 18, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t5(i8* nocapture %C) nounwind {
>> +entry:
>> +; CHECK: t5:
>> +; CHECK: movs [[REG5:r[0-9]+]], #0
>> +; CHECK: strb [[REG5]], [r0, #6]
>> +; CHECK: movw [[REG6:r[0-9]+]], #21587
>> +; CHECK: strh [[REG6]], [r0, #4]
>> +; CHECK: ldr [[REG7:r[0-9]+]],
>> +; CHECK: str [[REG7]]
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([7 x i8]* @.str5, i64 0, i64 0), i64 7, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t6() nounwind {
>> +entry:
>> +; CHECK: t6:
>> +; CHECK: vld1.8 {[[REG8:d[0-9]+]]}, [r0]
>> +; CHECK: vstr [[REG8]], [r1]
>> +; CHECK: adds r1, #6
>> +; CHECK: adds r0, #6
>> +; CHECK: vld1.8
>> +; CHECK: vst1.16
>> +  call void @llvm.memcpy.p0i8.p0i8.i64(i8* getelementptr inbounds ([512 x i8]* @spool.splbuf, i64 0, i64 0), i8* getelementptr inbounds ([14 x i8]* @.str6, i64 0, i64 0), i64 14, i32 1, i1 false)
>> +  ret void
>> +}
>> +
>> +%struct.Foo = type { i32, i32, i32, i32 }
>> +
>> +define void @t7(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {
>> +entry:
>> +; CHECK: t7
>> +; CHECK: vld1.32
>> +; CHECK: vst1.32
>> +  %0 = bitcast %struct.Foo* %a to i8*
>> +  %1 = bitcast %struct.Foo* %b to i8*
>> +  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)
>> +  ret void
>> +}
>> +
>>  declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind
>> +declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind
>> 
>> Added: llvm/trunk/test/CodeGen/ARM/memset-inline.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/memset-inline.ll?rev=169791&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/ARM/memset-inline.ll (added)
>> +++ llvm/trunk/test/CodeGen/ARM/memset-inline.ll Mon Dec 10 17:21:26 2012
>> @@ -0,0 +1,30 @@
>> +; RUN: llc < %s -mtriple=thumbv7-apple-ios -mcpu=cortex-a8 -pre-RA-sched=source -disable-post-ra | FileCheck %s
>> +
>> +define void @t1(i8* nocapture %c) nounwind optsize {
>> +entry:
>> +; CHECK: t1:
>> +; CHECK: movs r1, #0
>> +; CHECK: str r1, [r0]
>> +; CHECK: str r1, [r0, #4]
>> +; CHECK: str r1, [r0, #8]
>> +  call void @llvm.memset.p0i8.i64(i8* %c, i8 0, i64 12, i32 8, i1 false)
>> +  ret void
>> +}
>> +
>> +define void @t2() nounwind ssp {
>> +entry:
>> +; CHECK: t2:
>> +; CHECK: add.w r1, r0, #10
>> +; CHECK: vmov.i32 {{q[0-9]+}}, #0x0
>> +; CHECK: vst1.16 {d{{[0-9]+}}, d{{[0-9]+}}}, [r1]
>> +; CHECK: vst1.32 {d{{[0-9]+}}, d{{[0-9]+}}}, [r0]
>> +  %buf = alloca [26 x i8], align 1
>> +  %0 = getelementptr inbounds [26 x i8]* %buf, i32 0, i32 0
>> +  call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)
>> +  call void @something(i8* %0) nounwind
>> +  ret void
>> +}
>> +
>> +declare void @something(i8*) nounwind
>> +declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) nounwind
>> +declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
>> 
>> Removed: llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll?rev=169790&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (original)
>> +++ llvm/trunk/test/CodeGen/ARM/reg_asc_order.ll (removed)
>> @@ -1,16 +0,0 @@
>> -; RUN: llc < %s -march=arm -mcpu=cortex-a8 | FileCheck %s
>> -; Check that memcpy gets lowered to ldm/stm, at least in this very smple case.
>> -
>> -%struct.Foo = type { i32, i32, i32, i32 }
>> -
>> -define void @_Z10CopyStructP3FooS0_(%struct.Foo* nocapture %a, %struct.Foo* nocapture %b) nounwind {
>> -entry:
>> -;CHECK: ldm
>> -;CHECK: stm
>> -  %0 = bitcast %struct.Foo* %a to i8*
>> -  %1 = bitcast %struct.Foo* %b to i8*
>> -  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %0, i8* %1, i32 16, i32 4, i1 false)
>> -  ret void
>> -}
>> -
>> -declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i32, i1) nounwind
>> 
>> Modified: llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/2009-11-16-UnfoldMemOpBug.ll Mon Dec 10 17:21:26 2012
>> @@ -6,15 +6,16 @@
>>  define void @t(i32 %count) ssp nounwind {
>>  entry:
>>  ; CHECK: t:
>> -; CHECK: movq ___stack_chk_guard at GOTPCREL(%rip)
>> -; CHECK: movups L_str(%rip), %xmm0
>> +; CHECK: movups L_str+12(%rip), %xmm0
>> +; CHECK: movups L_str(%rip), %xmm1
>>    %tmp0 = alloca [60 x i8], align 1
>>    %tmp1 = getelementptr inbounds [60 x i8]* %tmp0, i64 0, i64 0
>>    br label %bb1
>> 
>>  bb1:
>>  ; CHECK: LBB0_1:
>> -; CHECK: movaps %xmm0, (%rsp)
>> +; CHECK: movups %xmm0, 12(%rsp)
>> +; CHECK: movaps %xmm1, (%rsp)
>>    %tmp2 = phi i32 [ %tmp3, %bb1 ], [ 0, %entry ]
>>    call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp1, i8* getelementptr inbounds ([28 x i8]* @str, i64 0, i64 0), i64 28, i32 1, i1 false)
>>    %tmp3 = add i32 %tmp2, 1
>> 
>> Modified: llvm/trunk/test/CodeGen/X86/memcpy-2.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/memcpy-2.ll?rev=169791&r1=169790&r2=169791&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/memcpy-2.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/memcpy-2.ll Mon Dec 10 17:21:26 2012
>> @@ -10,18 +10,18 @@
>>  define void @t1(i32 %argc, i8** %argv) nounwind  {
>>  entry:
>>  ; SSE2: t1:
>> +; SSE2: movsd _.str+16, %xmm0
>> +; SSE2: movsd %xmm0, 16(%esp)
>>  ; SSE2: movaps _.str, %xmm0
>>  ; SSE2: movaps %xmm0
>> -; SSE2: movb $0
>> -; SSE2: movl $0
>> -; SSE2: movl $0
>> +; SSE2: movb $0, 24(%esp)
>> 
>>  ; SSE1: t1:
>> +; SSE1: fldl _.str+16
>> +; SSE1: fstpl 16(%esp)
>>  ; SSE1: movaps _.str, %xmm0
>>  ; SSE1: movaps %xmm0
>> -; SSE1: movb $0
>> -; SSE1: movl $0
>> -; SSE1: movl $0
>> +; SSE1: movb $0, 24(%esp)
>> 
>>  ; NOSSE: t1:
>>  ; NOSSE: movb $0
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> <strcat.llvm.mips64el.ll>
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121212/22d29d4f/attachment.html>