[llvm] r238473 - Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.
Peter Collingbourne
peter at pcc.me.uk
Fri Jun 5 11:05:34 PDT 2015
Oliver,
Thanks for tracking down this miscompilation. I don't see an easy fix for both
problems, so I've reverted in r239169.
Peter
On Fri, Jun 05, 2015 at 04:54:27PM +0100, Oliver Stannard wrote:
> Hi Peter,
>
> This patch is causing miscompilations when targeting ARM v6M at low
> optimization levels (I've only observed it at -O0 so far). I've attached a
> reduced C file which triggers this (preprocessed.c), as well as the assembly
> output of clang both immediately before this commit (test_good.s) and with
> this commit (test_bad.s). These were compiled like this:
> /path/to/old/clang --target=armv6m-arm-none-eabi -S preprocessed.c -O0 -o
> test_good.s
> /path/to/new/clang --target=armv6m-arm-none-eabi -S preprocessed.c -O0 -o
> test_bad.s
>
> This should, when run, print out "checksum = 1" (the value of a.f4, assigned
> in fn1), but instead prints "checksum = 4".
>
> One of my colleagues has also raised a ticket for an assertion failure in
> the same commit: https://llvm.org/bugs/show_bug.cgi?id=23768.
>
> Is this something you will be able to fix quickly, or should we revert the
> change until it can be fixed?
>
> Sorry we took so long to track down these regressions.
>
> Thanks,
> Oliver
>
>
> > -----Original Message-----
> > From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> > bounces at cs.uiuc.edu] On Behalf Of Peter Collingbourne
> > Sent: 28 May 2015 21:03
> > To: llvm-commits at cs.uiuc.edu
> > Subject: [llvm] r238473 - Thumb2: Modify codegen for memcpy intrinsic to
> > prefer LDM/STM.
> >
> > Author: pcc
> > Date: Thu May 28 15:02:45 2015
> > New Revision: 238473
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=238473&view=rev
> > Log:
> > Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.
> >
> > We were previously codegen'ing these as regular load/store operations and
> > hoping that the register allocator would allocate registers in ascending
> > order so that we could apply an LDM/STM combine after register allocation.
> > According to the commit that first introduced this code (r37179), we
> > planned to teach the register allocator to allocate the registers in
> > ascending order. This never got implemented, and up to now we've been
> > stuck with very poor codegen.
> >
> > A much simpler approach for achiveing better codegen is to create LDM/STM
> > instructions with identical sets of virtual registers, let the register
> > allocator pick arbitrary registers and order register lists when printing
> > an MCInst. This approach also avoids the need to repeatedly calculate
> > offsets which ultimately ought to be eliminated pre-RA in order to
> > decrease register pressure.
> >
> > This is implemented by lowering the memcpy intrinsic to a series of SD-
> > only MCOPY pseudo-instructions which performs a memory copy using a given
> > number of registers. During SD->MI lowering, we lower MCOPY to LDM/STM.
> > This is a little unusual, but it avoids the need to encode register lists
> > in the SD, and we can take advantage of SD use lists to decide whether to
> > use the _UPD variant of the instructions.
> >
> > Fixes PR9199.
> >
> > Differential Revision: http://reviews.llvm.org/D9508
> >
> > Modified:
> > llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> > llvm/trunk/lib/Target/ARM/ARMISelLowering.h
> > llvm/trunk/lib/Target/ARM/ARMInstrInfo.td
> > llvm/trunk/lib/Target/ARM/ARMSelectionDAGInfo.cpp
> > llvm/trunk/lib/Target/ARM/InstPrinter/ARMInstPrinter.cpp
> > llvm/trunk/lib/Target/ARM/Thumb2SizeReduction.cpp
> > llvm/trunk/test/CodeGen/Thumb/ldm-stm-base-materialization.ll
> > llvm/trunk/test/CodeGen/Thumb/thumb-memcpy-ldm-stm.ll
> >
> > Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=238473&r1=238472
> > &r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
> > +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Thu May 28 15:02:45
> > +++ 2015
> > @@ -1122,6 +1122,7 @@ const char *ARMTargetLowering::getTarget
> > case ARMISD::VORRIMM: return "ARMISD::VORRIMM";
> > case ARMISD::VBICIMM: return "ARMISD::VBICIMM";
> > case ARMISD::VBSL: return "ARMISD::VBSL";
> > + case ARMISD::MCOPY: return "ARMISD::MCOPY";
> > case ARMISD::VLD2DUP: return "ARMISD::VLD2DUP";
> > case ARMISD::VLD3DUP: return "ARMISD::VLD3DUP";
> > case ARMISD::VLD4DUP: return "ARMISD::VLD4DUP";
> > @@ -7629,8 +7630,59 @@ ARMTargetLowering::EmitInstrWithCustomIn
> > }
> > }
> >
> > +/// \brief Lowers MCOPY to either LDMIA/STMIA or LDMIA_UPD/STMID_UPD
> > +depending /// on whether the result is used. This is done as a
> > +post-isel lowering instead /// of as a custom inserter because we need
> > the use list from the SDNode.
> > +static void LowerMCOPY(const ARMSubtarget *Subtarget, MachineInstr *MI,
> > + SDNode *Node) {
> > + bool isThumb1 = Subtarget->isThumb1Only();
> > + bool isThumb2 = Subtarget->isThumb2();
> > + const ARMBaseInstrInfo *TII = Subtarget->getInstrInfo();
> > +
> > + DebugLoc dl = MI->getDebugLoc();
> > + MachineBasicBlock *BB = MI->getParent(); MachineFunction *MF =
> > + BB->getParent(); MachineRegisterInfo &MRI = MF->getRegInfo();
> > +
> > + MachineInstrBuilder LD, ST;
> > + if (isThumb1 || Node->hasAnyUseOfValue(1)) {
> > + LD = BuildMI(*BB, MI, dl, TII->get(isThumb2 ? ARM::t2LDMIA_UPD
> > + : isThumb1 ?
> > ARM::tLDMIA_UPD
> > + :
> > ARM::LDMIA_UPD))
> > + .addOperand(MI->getOperand(1)); } else {
> > + LD = BuildMI(*BB, MI, dl, TII->get(isThumb2 ? ARM::t2LDMIA :
> > + ARM::LDMIA)); }
> > +
> > + if (isThumb1 || Node->hasAnyUseOfValue(0)) {
> > + ST = BuildMI(*BB, MI, dl, TII->get(isThumb2 ? ARM::t2STMIA_UPD
> > + : isThumb1 ?
> > ARM::tSTMIA_UPD
> > + :
> > ARM::STMIA_UPD))
> > + .addOperand(MI->getOperand(0)); } else {
> > + ST = BuildMI(*BB, MI, dl, TII->get(isThumb2 ? ARM::t2STMIA :
> > + ARM::STMIA)); }
> > +
> > + LD.addOperand(MI->getOperand(3)).addImm(ARMCC::AL).addReg(0);
> > + ST.addOperand(MI->getOperand(2)).addImm(ARMCC::AL).addReg(0);
> > +
> > + for (unsigned I = 0; I != MI->getOperand(4).getImm(); ++I) {
> > + unsigned TmpReg = MRI.createVirtualRegister(isThumb1 ?
> > &ARM::tGPRRegClass
> > + :
> > &ARM::GPRRegClass);
> > + LD.addReg(TmpReg, RegState::Define);
> > + ST.addReg(TmpReg, RegState::Kill);
> > + }
> > +
> > + MI->eraseFromParent();
> > +}
> > +
> > void ARMTargetLowering::AdjustInstrPostInstrSelection(MachineInstr *MI,
> > SDNode *Node) const
> > {
> > + if (MI->getOpcode() == ARM::MCOPY) {
> > + LowerMCOPY(Subtarget, MI, Node);
> > + return;
> > + }
> > +
> > const MCInstrDesc *MCID = &MI->getDesc();
> > // Adjust potentially 's' setting instructions after isel, i.e. ADC,
> > SBC, RSB,
> > // RSC. Coming out of isel, they have an implicit CPSR def, but the
> > optional
> >
> > Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.h
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/ARMISelLowering.h?rev=238473&r1=238472&r
> > 2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/ARMISelLowering.h (original)
> > +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.h Thu May 28 15:02:45 2015
> > @@ -189,6 +189,10 @@ namespace llvm {
> > // Vector bitwise select
> > VBSL,
> >
> > + // Pseudo-instruction representing a memory copy using ldm/stm
> > + // instructions.
> > + MCOPY,
> > +
> > // Vector load N-element structure to all lanes:
> > VLD2DUP = ISD::FIRST_TARGET_MEMORY_OPCODE,
> > VLD3DUP,
> >
> > Modified: llvm/trunk/lib/Target/ARM/ARMInstrInfo.td
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/ARMInstrInfo.td?rev=238473&r1=238472&r2=
> > 238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/ARMInstrInfo.td (original)
> > +++ llvm/trunk/lib/Target/ARM/ARMInstrInfo.td Thu May 28 15:02:45 2015
> > @@ -73,6 +73,10 @@ def SDT_ARMBFI : SDTypeProfile<1, 3, [SD def
> > SDT_ARMVMAXNM : SDTypeProfile<1, 2, [SDTCisFP<0>, SDTCisFP<1>,
> > SDTCisFP<2>]>; def SDT_ARMVMINNM : SDTypeProfile<1, 2, [SDTCisFP<0>,
> > SDTCisFP<1>, SDTCisFP<2>]>;
> >
> > +def SDT_ARMMCOPY : SDTypeProfile<2, 3, [SDTCisVT<0, i32>, SDTCisVT<1,
> > i32>,
> > + SDTCisVT<2, i32>, SDTCisVT<3,
> > i32>,
> > + SDTCisVT<4, i32>]>;
> > +
> > def SDTBinaryArithWithFlags : SDTypeProfile<2, 2,
> > [SDTCisSameAs<0, 2>,
> > SDTCisSameAs<0, 3>,
> > @@ -179,6 +183,10 @@ def ARMbfi : SDNode<"ARMISD::B
> > def ARMvmaxnm : SDNode<"ARMISD::VMAXNM", SDT_ARMVMAXNM, []>;
> > def ARMvminnm : SDNode<"ARMISD::VMINNM", SDT_ARMVMINNM, []>;
> >
> > +def ARMmcopy : SDNode<"ARMISD::MCOPY", SDT_ARMMCOPY,
> > + [SDNPHasChain, SDNPInGlue, SDNPOutGlue,
> > + SDNPMayStore, SDNPMayLoad]>;
> > +
> > //===--------------------------------------------------------------------
> > --===//
> > // ARM Instruction Predicate Definitions.
> > //
> > @@ -4552,6 +4560,13 @@ let usesCustomInserter = 1 in {
> > [(ARMcopystructbyval GPR:$dst, GPR:$src, imm:$size,
> > imm:$alignment)]>; }
> >
> > +let hasPostISelHook = 1 in {
> > + def MCOPY : PseudoInst<
> > + (outs GPR:$newdst, GPR:$newsrc), (ins GPR:$dst, GPR:$src,
> > i32imm:$nreg),
> > + NoItinerary,
> > + [(set GPR:$newdst, GPR:$newsrc, (ARMmcopy GPR:$dst, GPR:$src,
> > +imm:$nreg))]>; }
> > +
> > def ldrex_1 : PatFrag<(ops node:$ptr), (int_arm_ldrex node:$ptr), [{
> > return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i8; }]>;
> >
> > Modified: llvm/trunk/lib/Target/ARM/ARMSelectionDAGInfo.cpp
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/ARMSelectionDAGInfo.cpp?rev=238473&r1=23
> > 8472&r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/ARMSelectionDAGInfo.cpp (original)
> > +++ llvm/trunk/lib/Target/ARM/ARMSelectionDAGInfo.cpp Thu May 28
> > +++ 15:02:45 2015
> > @@ -164,41 +164,38 @@ ARMSelectionDAGInfo::EmitTargetCodeForMe
> > unsigned VTSize = 4;
> > unsigned i = 0;
> > // Emit a maximum of 4 loads in Thumb1 since we have fewer registers
> > - const unsigned MAX_LOADS_IN_LDM = Subtarget.isThumb1Only() ? 4 : 6;
> > + const unsigned MaxLoadsInLDM = Subtarget.isThumb1Only() ? 4 : 6;
> > SDValue TFOps[6];
> > SDValue Loads[6];
> > uint64_t SrcOff = 0, DstOff = 0;
> >
> > - // Emit up to MAX_LOADS_IN_LDM loads, then a TokenFactor barrier, then
> > the
> > - // same number of stores. The loads and stores will get combined into
> > - // ldm/stm later on.
> > - while (EmittedNumMemOps < NumMemOps) {
> > - for (i = 0;
> > - i < MAX_LOADS_IN_LDM && EmittedNumMemOps + i < NumMemOps; ++i) {
> > - Loads[i] = DAG.getLoad(VT, dl, Chain,
> > - DAG.getNode(ISD::ADD, dl, MVT::i32, Src,
> > - DAG.getConstant(SrcOff, dl,
> > MVT::i32)),
> > - SrcPtrInfo.getWithOffset(SrcOff),
> > isVolatile,
> > - false, false, 0);
> > - TFOps[i] = Loads[i].getValue(1);
> > - SrcOff += VTSize;
> > - }
> > - Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
> > - makeArrayRef(TFOps, i));
> > -
> > - for (i = 0;
> > - i < MAX_LOADS_IN_LDM && EmittedNumMemOps + i < NumMemOps; ++i) {
> > - TFOps[i] = DAG.getStore(Chain, dl, Loads[i],
> > - DAG.getNode(ISD::ADD, dl, MVT::i32, Dst,
> > - DAG.getConstant(DstOff, dl,
> > MVT::i32)),
> > - DstPtrInfo.getWithOffset(DstOff),
> > - isVolatile, false, 0);
> > - DstOff += VTSize;
> > - }
> > - Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
> > - makeArrayRef(TFOps, i));
> > + // FIXME: We should invent a VMCOPY pseudo-instruction that lowers to
> > + // VLDM/VSTM and make this code emit it when appropriate. This would
> > + reduce // pressure on the general purpose registers. However this
> > + seems harder to map // onto the register allocator's view of the world.
> >
> > - EmittedNumMemOps += i;
> > + // The number of MCOPY pseudo-instructions to emit. We use up to
> > + MaxLoadsInLDM // registers per mcopy, which will get lowered into
> > + ldm/stm later on. This is // a lower bound on the number of MCOPY
> > operations we must emit.
> > + unsigned NumMCOPYs = (NumMemOps + MaxLoadsInLDM - 1) / MaxLoadsInLDM;
> > +
> > + SDVTList VTs = DAG.getVTList(MVT::i32, MVT::i32, MVT::Other,
> > + MVT::Glue);
> > +
> > + for (unsigned I = 0; I != NumMCOPYs; ++I) {
> > + // Evenly distribute registers among MCOPY operations to reduce
> > register
> > + // pressure.
> > + unsigned NextEmittedNumMemOps = NumMemOps * (I + 1) / NumMCOPYs;
> > + unsigned NumRegs = NextEmittedNumMemOps - EmittedNumMemOps;
> > +
> > + Dst = DAG.getNode(ARMISD::MCOPY, dl, VTs, Chain, Dst, Src,
> > + DAG.getConstant(NumRegs, dl, MVT::i32));
> > + Src = Dst.getValue(1);
> > + Chain = Dst.getValue(2);
> > +
> > + DstPtrInfo = DstPtrInfo.getWithOffset(NumRegs * VTSize);
> > + SrcPtrInfo = SrcPtrInfo.getWithOffset(NumRegs * VTSize);
> > +
> > + EmittedNumMemOps = NextEmittedNumMemOps;
> > }
> >
> > if (BytesLeft == 0)
> >
> > Modified: llvm/trunk/lib/Target/ARM/InstPrinter/ARMInstPrinter.cpp
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/InstPrinter/ARMInstPrinter.cpp?rev=23847
> > 3&r1=238472&r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/InstPrinter/ARMInstPrinter.cpp (original)
> > +++ llvm/trunk/lib/Target/ARM/InstPrinter/ARMInstPrinter.cpp Thu May 28
> > +++ 15:02:45 2015
> > @@ -744,10 +744,21 @@ void ARMInstPrinter::printRegisterList(c
> > const MCSubtargetInfo &STI,
> > raw_ostream &O) {
> > O << "{";
> > - for (unsigned i = OpNum, e = MI->getNumOperands(); i != e; ++i) {
> > - if (i != OpNum)
> > +
> > + // The backend may have given us a register list in non-ascending
> > + order. Sort // it now.
> > + std::vector<MCOperand> RegOps(MI->size() - OpNum);
> > + std::copy(MI->begin() + OpNum, MI->end(), RegOps.begin());
> > + std::sort(RegOps.begin(), RegOps.end(),
> > + [this](const MCOperand &O1, const MCOperand &O2) -> bool {
> > + return MRI.getEncodingValue(O1.getReg()) <
> > + MRI.getEncodingValue(O2.getReg());
> > + });
> > +
> > + for (unsigned i = 0, e = RegOps.size(); i != e; ++i) {
> > + if (i != 0)
> > O << ", ";
> > - printRegName(O, MI->getOperand(i).getReg());
> > + printRegName(O, RegOps[i].getReg());
> > }
> > O << "}";
> > }
> >
> > Modified: llvm/trunk/lib/Target/ARM/Thumb2SizeReduction.cpp
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/lib/Target/ARM/Thumb2SizeReduction.cpp?rev=238473&r1=23
> > 8472&r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/lib/Target/ARM/Thumb2SizeReduction.cpp (original)
> > +++ llvm/trunk/lib/Target/ARM/Thumb2SizeReduction.cpp Thu May 28
> > +++ 15:02:45 2015
> > @@ -125,7 +125,10 @@ namespace {
> > { ARM::t2LDMIA, ARM::tLDMIA, 0, 0, 0, 1, 1, 1,1,
> > 0,1,0 },
> > { ARM::t2LDMIA_RET,0, ARM::tPOP_RET, 0, 0, 1, 1, 1,1,
> > 0,1,0 },
> > { ARM::t2LDMIA_UPD,ARM::tLDMIA_UPD,ARM::tPOP,0, 0, 1, 1, 1,1,
> > 0,1,0 },
> > - // ARM::t2STM (with no basereg writeback) has no Thumb1 equivalent
> > + // ARM::t2STMIA (with no basereg writeback) has no Thumb1 equivalent.
> > + // tSTMIA_UPD is a change in semantics which can only be used if the
> > + base // register is killed. This difference is correctly handled
> > elsewhere.
> > + { ARM::t2STMIA, ARM::tSTMIA_UPD, 0, 0, 0, 1, 1, 1,1,
> > 0,1,0 },
> > { ARM::t2STMIA_UPD,ARM::tSTMIA_UPD, 0, 0, 0, 1, 1, 1,1,
> > 0,1,0 },
> > { ARM::t2STMDB_UPD, 0, ARM::tPUSH, 0, 0, 1, 1, 1,1,
> > 0,1,0 }
> > };
> > @@ -432,6 +435,14 @@ Thumb2SizeReduce::ReduceLoadStore(Machin
> > isLdStMul = true;
> > break;
> > }
> > + case ARM::t2STMIA: {
> > + // If the base register is killed, we don't care what its value is
> > after the
> > + // instruction, so we can use an updating STMIA.
> > + if (!MI->getOperand(0).isKill())
> > + return false;
> > +
> > + break;
> > + }
> > case ARM::t2LDMIA_RET: {
> > unsigned BaseReg = MI->getOperand(1).getReg();
> > if (BaseReg != ARM::SP)
> > @@ -489,6 +500,12 @@ Thumb2SizeReduce::ReduceLoadStore(Machin
> > // Add the 16-bit load / store instruction.
> > DebugLoc dl = MI->getDebugLoc();
> > MachineInstrBuilder MIB = BuildMI(MBB, MI, dl, TII->get(Opc));
> > +
> > + // tSTMIA_UPD takes a defining register operand. We've already
> > + checked that // the register is killed, so mark it as dead here.
> > + if (Entry.WideOpc == ARM::t2STMIA)
> > + MIB.addReg(MI->getOperand(0).getReg(), RegState::Define |
> > + RegState::Dead);
> > +
> > if (!isLdStMul) {
> > MIB.addOperand(MI->getOperand(0));
> > MIB.addOperand(MI->getOperand(1));
> >
> > Modified: llvm/trunk/test/CodeGen/Thumb/ldm-stm-base-materialization.ll
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/test/CodeGen/Thumb/ldm-stm-base-
> > materialization.ll?rev=238473&r1=238472&r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/test/CodeGen/Thumb/ldm-stm-base-materialization.ll
> > (original)
> > +++ llvm/trunk/test/CodeGen/Thumb/ldm-stm-base-materialization.ll Thu
> > +++ May 28 15:02:45 2015
> > @@ -6,15 +6,17 @@ target triple = "thumbv6m-none--eabi"
> > @b = external global i32*
> >
> > ; Function Attrs: nounwind
> > -define void @foo() #0 {
> > +define void @foo24() #0 {
> > entry:
> > -; CHECK-LABEL: foo:
> > -; CHECK: ldr r[[SB:[0-9]]], .LCPI
> > +; CHECK-LABEL: foo24:
> > ; CHECK: ldr r[[LB:[0-9]]], .LCPI
> > ; CHECK: adds r[[NLB:[0-9]]], r[[LB]], #4 -; CHECK-NEXT: ldm r[[NLB]],
> > +; CHECK: ldr r[[SB:[0-9]]], .LCPI
> > ; CHECK: adds r[[NSB:[0-9]]], r[[SB]], #4 -; CHECK-NEXT: stm r[[NSB]]
> > +; CHECK-NEXT: ldm r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]],
> > +r[[R3:[0-9]]]} ; CHECK-NEXT: stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]}
> > +; CHECK-NEXT: ldm r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]],
> > +r[[R3:[0-9]]]} ; CHECK-NEXT: stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]}
> > %0 = load i32*, i32** @a, align 4
> > %arrayidx = getelementptr inbounds i32, i32* %0, i32 1
> > %1 = bitcast i32* %arrayidx to i8*
> > @@ -25,5 +27,70 @@ entry:
> > ret void
> > }
> >
> > +define void @foo28() #0 {
> > +entry:
> > +; CHECK-LABEL: foo28:
> > +; CHECK: ldr r[[LB:[0-9]]], .LCPI
> > +; CHECK: adds r[[NLB:[0-9]]], r[[LB]], #4 ; CHECK: ldr r[[SB:[0-9]]],
> > +.LCPI ; CHECK: adds r[[NSB:[0-9]]], r[[SB]], #4 ; CHECK-NEXT: ldm
> > +r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]]} ; CHECK-NEXT:
> > +stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]} ; CHECK-NEXT: ldm r[[NLB]]!,
> > +{r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]], r[[R4:[0-9]]]} ;
> > +CHECK-NEXT: stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]], r[[R4]]}
> > + %0 = load i32*, i32** @a, align 4
> > + %arrayidx = getelementptr inbounds i32, i32* %0, i32 1
> > + %1 = bitcast i32* %arrayidx to i8*
> > + %2 = load i32*, i32** @b, align 4
> > + %arrayidx1 = getelementptr inbounds i32, i32* %2, i32 1
> > + %3 = bitcast i32* %arrayidx1 to i8*
> > + tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %3, i32 28, i32
> > +4, i1 false)
> > + ret void
> > +}
> > +
> > +define void @foo32() #0 {
> > +entry:
> > +; CHECK-LABEL: foo32:
> > +; CHECK: ldr r[[LB:[0-9]]], .LCPI
> > +; CHECK: adds r[[NLB:[0-9]]], r[[LB]], #4 ; CHECK: ldr r[[SB:[0-9]]],
> > +.LCPI ; CHECK: adds r[[NSB:[0-9]]], r[[SB]], #4 ; CHECK-NEXT: ldm
> > +r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]], r[[R4:[0-9]]]}
> > +; CHECK-NEXT: stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]], r[[R4]]} ;
> > +CHECK-NEXT: ldm r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]],
> > +r[[R3:[0-9]]], r[[R4:[0-9]]]} ; CHECK-NEXT: stm r[[NSB]]!, {r[[R1]],
> > +r[[R2]], r[[R3]], r[[R4]]}
> > + %0 = load i32*, i32** @a, align 4
> > + %arrayidx = getelementptr inbounds i32, i32* %0, i32 1
> > + %1 = bitcast i32* %arrayidx to i8*
> > + %2 = load i32*, i32** @b, align 4
> > + %arrayidx1 = getelementptr inbounds i32, i32* %2, i32 1
> > + %3 = bitcast i32* %arrayidx1 to i8*
> > + tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %3, i32 32, i32
> > +4, i1 false)
> > + ret void
> > +}
> > +
> > +define void @foo36() #0 {
> > +entry:
> > +; CHECK-LABEL: foo36:
> > +; CHECK: ldr r[[LB:[0-9]]], .LCPI
> > +; CHECK: adds r[[NLB:[0-9]]], r[[LB]], #4 ; CHECK: ldr r[[SB:[0-9]]],
> > +.LCPI ; CHECK: adds r[[NSB:[0-9]]], r[[SB]], #4 ; CHECK-NEXT: ldm
> > +r[[NLB]]!, {r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]]} ; CHECK-NEXT:
> > +stm r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]} ; CHECK-NEXT: ldm r[[NLB]]!,
> > +{r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]]} ; CHECK-NEXT: stm
> > +r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]} ; CHECK-NEXT: ldm r[[NLB]]!,
> > +{r[[R1:[0-9]]], r[[R2:[0-9]]], r[[R3:[0-9]]]} ; CHECK-NEXT: stm
> > +r[[NSB]]!, {r[[R1]], r[[R2]], r[[R3]]}
> > + %0 = load i32*, i32** @a, align 4
> > + %arrayidx = getelementptr inbounds i32, i32* %0, i32 1
> > + %1 = bitcast i32* %arrayidx to i8*
> > + %2 = load i32*, i32** @b, align 4
> > + %arrayidx1 = getelementptr inbounds i32, i32* %2, i32 1
> > + %3 = bitcast i32* %arrayidx1 to i8*
> > + tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %3, i32 36, i32
> > +4, i1 false)
> > + ret void
> > +}
> > +
> > ; Function Attrs: nounwind
> > declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture
> > readonly, i32, i32, i1) #1
> >
> > Modified: llvm/trunk/test/CodeGen/Thumb/thumb-memcpy-ldm-stm.ll
> > URL: http://llvm.org/viewvc/llvm-
> > project/llvm/trunk/test/CodeGen/Thumb/thumb-memcpy-ldm-
> > stm.ll?rev=238473&r1=238472&r2=238473&view=diff
> > ==========================================================================
> > ====
> > --- llvm/trunk/test/CodeGen/Thumb/thumb-memcpy-ldm-stm.ll (original)
> > +++ llvm/trunk/test/CodeGen/Thumb/thumb-memcpy-ldm-stm.ll Thu May 28
> > +++ 15:02:45 2015
> > @@ -7,8 +7,8 @@ define void @t1() #0 {
> > entry:
> > ; CHECK-LABEL: t1:
> > ; CHECK: ldr r[[LB:[0-9]]],
> > -; CHECK-NEXT: ldm r[[LB]]!,
> > ; CHECK-NEXT: ldr r[[SB:[0-9]]],
> > +; CHECK-NEXT: ldm r[[LB]]!,
> > ; CHECK-NEXT: stm r[[SB]]!,
> > ; CHECK-NEXT: ldrb {{.*}}, [r[[LB]]]
> > ; CHECK-NEXT: strb {{.*}}, [r[[SB]]]
> > @@ -21,8 +21,8 @@ define void @t2() #0 {
> > entry:
> > ; CHECK-LABEL: t2:
> > ; CHECK: ldr r[[LB:[0-9]]],
> > -; CHECK-NEXT: ldm r[[LB]]!,
> > ; CHECK-NEXT: ldr r[[SB:[0-9]]],
> > +; CHECK-NEXT: ldm r[[LB]]!,
> > ; CHECK-NEXT: stm r[[SB]]!,
> > ; CHECK-NEXT: ldrh {{.*}}, [r[[LB]]]
> > ; CHECK-NEXT: ldrb {{.*}}, [r[[LB]], #2]
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
--
Peter
More information about the llvm-commits
mailing list