[llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
David Peixotto
dpeixott at codeaurora.org
Tue Oct 22 09:13:22 PDT 2013
Ok, I will make a patch to switch to using inline conditionals instead.
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Monday, October 21, 2013 9:18 PM
> To: David Peixotto
> Cc: llvm-commits at cs.uiuc.edu LLVM
> Subject: Re: [llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
>
>
> On Oct 21, 2013, at 7:20 PM, David Peixotto <dpeixott at codeaurora.org>
> wrote:
>
> > Hi Bob,
> >
> > I agree that a generic emitter would be useful, but I'm not sure I
> > would get the time to work on such a project at this point.
>
> In that case, I think it would be better to simplify this code to use a
more
> direct approach with conditionals.
>
> >
> > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> >> -----Original Message-----
> >> From: Bob Wilson [mailto:bob.wilson at apple.com]
> >> Sent: Monday, October 21, 2013 12:12 PM
> >> To: David Peixotto
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [llvm] r192915 - Refactor lowering for
> >> COPY_STRUCT_BYVAL_I32
> >>
> >> It seems like it make more sense to make the StructByvalEmitter a
> >> generic "emitter" for ARM instructions. I suspect there are a number
> >> of other
> > places
> >> in ARMISelLowering that could make good use of it. Would you be
> >> willing
> > to
> >> investigate that?
> >>
> >> As it stands, this really does seem like overkill for the struct
> >> byval
> > issue, but if
> >> you could make it more generally useful, then the extra code would be
> >> worthwhile.
> >>
> >> On Oct 21, 2013, at 10:02 AM, David Peixotto
> >> <dpeixott at codeaurora.org>
> >> wrote:
> >>
> >>> Hi Bob,
> >>>
> >>> I think your criticism is valid here. I wasn't too happy with how
> >>> much code I ended up writing for this change. When I started I
> >>> thought the code size would be about equal after implementing the
> >>> thumb1 lowering because I was getting rid of some code duplication,
> >>> but the code size for abstracting the common parts was larger than I
had
> anticipated.
> >>> There is no fundamental problem with implementing it with
> >>> conditionals, I did it this way because I thought it would be
> >>> clearer
> > and
> >> would be easier to write correctly.
> >>>
> >>> I think the way it is now has the advantage that the lowering
> >>> algorithm is clearly separated from the details of generating
> >>> machine instructions for each sub-target. I think it would be easier
> >>> to improve the algorithm with the way it is now. For example, we are
> >>> always using byte stores to copy any leftover that does not fit into
> >>> the "unit" size. So if we have a 31-byte struct and a target that
> >>> supports neon we will generate a 16-byte store and
> >>> 15 1-byte stores. We could improve this by generating fewer stores
> >>> for the leftover (3x4-byte + 1x2byte + 1x1byte). I don't know if we
> >>> actually care about this kind of change, but I believe it would be
> > easier to
> >> make now.
> >>>
> >>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> >>> Forum, hosted by The Linux Foundation
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
> >>>> Sent: Saturday, October 19, 2013 7:00 PM
> >>>> To: David Peixotto
> >>>> Cc: llvm-commits at cs.uiuc.edu
> >>>> Subject: Re: [llvm] r192915 - Refactor lowering for
> >>>> COPY_STRUCT_BYVAL_I32
> >>>>
> >>>> This is very nice and elegant, but it's an awful lot of code for
> >>>> something
> >>> that
> >>>> isn't really that complicated. It seems like overkill to me. Did
> >>>> you
> >>> consider
> >>>> implementing the Thumb1 support by just adding more conditionals?
> >>>> Is there a fundamental problem with that?
> >>>>
> >>>> On Oct 17, 2013, at 12:49 PM, David Peixotto
> >>>> <dpeixott at codeaurora.org>
> >>>> wrote:
> >>>>
> >>>>> Author: dpeixott
> >>>>> Date: Thu Oct 17 14:49:22 2013
> >>>>> New Revision: 192915
> >>>>>
> >>>>> URL: http://llvm.org/viewvc/llvm-project?rev=192915&view=rev
> >>>>> Log:
> >>>>> Refactor lowering for COPY_STRUCT_BYVAL_I32
> >>>>>
> >>>>> This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32
> >>>>> pseudo-instruction in the ARM backend. We introduce a new helper
> >>>>> class that encapsulates all of the operations needed during the
> > lowering.
> >>>>> The operations are implemented for each subtarget in different
> >>>>> subclasses. Currently only arm and thumb2 subtargets are supported.
> >>>>>
> >>>>> This refactoring was done to easily implement support for thumb1
> >>>>> subtargets. This initial patch does not add support for thumb1,
> >>>>> but is only a refactoring. A follow on patch will implement the
> >>>>> support for
> >>>>> thumb1 subtargets.
> >>>>>
> >>>>> No intended functionality change.
> >>>>>
> >>>>> Modified:
> >>>>> llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> >>>>>
> >>>>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> >>>>> URL:
> >>>>> http://llvm.org/viewvc/llvm-
> >> project/llvm/trunk/lib/Target/ARM/ARMISe
> >>>>> lL owering.cpp?rev=192915&r1=192914&r2=192915&view=diff
> >>>>>
> >>>>
> >>
> ==========================================================
> >>>> ============
> >>>>> ========
> >>>>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
> >>>>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Thu Oct 17
> >>>>> +++ 14:49:22
> >>>>> +++ 2013
> >>>>> @@ -48,6 +48,7 @@
> >>>>> #include "llvm/Support/MathExtras.h"
> >>>>> #include "llvm/Support/raw_ostream.h"
> >>>>> #include "llvm/Target/TargetOptions.h"
> >>>>> +#include <utility>
> >>>>> using namespace llvm;
> >>>>>
> >>>>> STATISTIC(NumTailCalls, "Number of tail calls"); @@ -7245,8
> >>>>> +7246,430 @@ MachineBasicBlock *OtherSucc(MachineBasi
> >>>>> llvm_unreachable("Expecting a BB with two successors!"); }
> >>>>>
> >>>>> -MachineBasicBlock *ARMTargetLowering::
> >>>>> -EmitStructByval(MachineInstr *MI, MachineBasicBlock *BB) const {
> >>>>> +namespace {
> >>>>> +// This class is a helper for lowering the COPY_STRUCT_BYVAL_I32
> >>>> instruction.
> >>>>> +// It defines the operations needed to lower the byval copy. We
> >>>>> +use a helper // class because the opcodes and machine
> >>>>> +instructions are different for each // subtarget, but the overall
> >>>>> +algorithm for the lowering is the same. The // implementation of
> >>>>> +each operation will be defined separately for arm, thumb1, // and
> >>>>> +thumb2 targets by subclassing this base class. See //
> >>> ARMTargetLowering::EmitStructByval()
> >>>> for how these operations are used.
> >>>>> +class TargetStructByvalEmitter {
> >>>>> +public:
> >>>>> + TargetStructByvalEmitter(const TargetInstrInfo *TII_,
> >>>>> + MachineRegisterInfo &MRI_,
> >>>>> + const TargetRegisterClass *TRC_)
> >>>>> + : TII(TII_), MRI(MRI_), TRC(TRC_) {}
> >>>>> +
> >>>>> + // Emit a post-increment load of "unit" size. The unit size is
> >>>>> + based on the // alignment of the struct being copied (4, 2, or
> >>>>> + 1 bytes). Alignments higher // than 4 are handled separately by
> >>>>> + using
> >>>> NEON instructions.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to load.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + // \returns the register holding the loaded value.
> >>>>> + virtual unsigned emitUnitLoad(MachineBasicBlock *BB,
> >>>>> + MachineInstr
> >>>> *MI,
> >>>>> + DebugLoc &dl, unsigned baseReg,
> >>>>> + unsigned baseOut) = 0;
> >>>>> +
> >>>>> + // Emit a post-increment store of "unit" size. The unit size is
> >>>>> + based on the // alignment of the struct being copied (4, 2, or
> >>>>> + 1 bytes). Alignments higher // than 4 are handled separately by
> >>>>> + using
> >>>> NEON instructions.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to store.
> >>>>> + // \param storeReg the register holding the value to store.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + virtual void emitUnitStore(MachineBasicBlock *BB, MachineInstr
> *MI,
> >>>>> + DebugLoc &dl, unsigned baseReg,
> >>>>> + unsigned
> >>> storeReg,
> >>>>> + unsigned baseOut) = 0;
> >>>>> +
> >>>>> + // Emit a post-increment load of one byte.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to load.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + // \returns the register holding the loaded value.
> >>>>> + virtual unsigned emitByteLoad(MachineBasicBlock *BB,
> >>>>> + MachineInstr
> >>>> *MI,
> >>>>> + DebugLoc &dl, unsigned baseReg,
> >>>>> + unsigned baseOut) = 0;
> >>>>> +
> >>>>> + // Emit a post-increment store of one byte.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to store.
> >>>>> + // \param storeReg the register holding the value to store.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + virtual void emitByteStore(MachineBasicBlock *BB, MachineInstr
> *MI,
> >>>>> + DebugLoc &dl, unsigned baseReg,
> >>>>> + unsigned
> >>> storeReg,
> >>>>> + unsigned baseOut) = 0;
> >>>>> +
> >>>>> + // Emit a load of a constant value.
> >>>>> + //
> >>>>> + // \param Constant the register holding the address to store.
> >>>>> + // \returns the register holding the loaded value.
> >>>>> + virtual unsigned emitConstantLoad(MachineBasicBlock *BB,
> >>>> MachineInstr *MI,
> >>>>> + DebugLoc &dl, unsigned
Constant,
> >>>>> + const DataLayout *DL) = 0;
> >>>>> +
> >>>>> + // Emit a subtract of a register minus immediate, with the
> >>>>> + immediate equal to // the "unit" size. The unit size is based
> >>>>> + on the alignment of the struct // being copied (16, 8, 4, 2, or
> >>>>> + 1
> >>> bytes).
> >>>>> + //
> >>>>> + // \param InReg the register holding the initial value.
> >>>>> + // \param OutReg the register to recieve the subtracted value.
> >>>>> + virtual void emitSubImm(MachineBasicBlock *BB, MachineInstr
> >>>>> + *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned InReg, unsigned OutReg) = 0;
> >>>>> +
> >>>>> + // Emit a branch based on a condition code of not equal.
> >>>>> + //
> >>>>> + // \param TargetBB the destination of the branch.
> >>>>> + virtual void emitBranchNE(MachineBasicBlock *BB, MachineInstr
> *MI,
> >>>>> + DebugLoc &dl, MachineBasicBlock
> >>>>> + *TargetBB) = 0;
> >>>>> +
> >>>>> + // Find the constant pool index for the given constant. This
> >>>>> + method is // implemented in the base class because it is the
> >>>>> + same for all
> >>>> subtargets.
> >>>>> + //
> >>>>> + // \param LoopSize the constant value for which the index
> >>>>> + should be
> >>>> returned.
> >>>>> + // \returns the constant pool index for the constant.
> >>>>> + unsigned getConstantPoolIndex(MachineFunction *MF, const
> >>>> DataLayout *DL,
> >>>>> + unsigned LoopSize) {
> >>>>> + MachineConstantPool *ConstantPool = MF->getConstantPool();
> >>>>> + Type *Int32Ty = Type::getInt32Ty(MF->getFunction()-
> >>> getContext());
> >>>>> + const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> >>>>> +
> >>>>> + // MachineConstantPool wants an explicit alignment.
> >>>>> + unsigned Align = DL->getPrefTypeAlignment(Int32Ty);
> >>>>> + if (Align == 0)
> >>>>> + Align = DL->getTypeAllocSize(C->getType());
> >>>>> + return ConstantPool->getConstantPoolIndex(C, Align); }
> >>>>> +
> >>>>> + // Return the register class used by the subtarget.
> >>>>> + //
> >>>>> + // \returns the target register class.
> >>>>> + const TargetRegisterClass *getTRC() const { return TRC; }
> >>>>> +
> >>>>> + virtual ~TargetStructByvalEmitter() {};
> >>>>> +
> >>>>> +protected:
> >>>>> + const TargetInstrInfo *TII;
> >>>>> + MachineRegisterInfo &MRI;
> >>>>> + const TargetRegisterClass *TRC; };
> >>>>> +
> >>>>> +class ARMStructByvalEmitter : public TargetStructByvalEmitter {
> >>>>> +public:
> >>>>> + ARMStructByvalEmitter(const TargetInstrInfo *TII,
> >>>>> +MachineRegisterInfo
> >>>> &MRI,
> >>>>> + unsigned LoadStoreSize)
> >>>>> + : TargetStructByvalEmitter(
> >>>>> + TII, MRI, (const TargetRegisterClass
> > *)&ARM::GPRRegClass),
> >>>>> + UnitSize(LoadStoreSize),
> >>>>> + UnitLdOpc(LoadStoreSize == 4
> >>>>> + ? ARM::LDR_POST_IMM
> >>>>> + : LoadStoreSize == 2
> >>>>> + ? ARM::LDRH_POST
> >>>>> + : LoadStoreSize == 1 ?
> >>>>> + ARM::LDRB_POST_IMM
> > :
> >>> 0),
> >>>>> + UnitStOpc(LoadStoreSize == 4
> >>>>> + ? ARM::STR_POST_IMM
> >>>>> + : LoadStoreSize == 2
> >>>>> + ? ARM::STRH_POST
> >>>>> + : LoadStoreSize == 1 ?
> >>>>> +ARM::STRB_POST_IMM
> >>>>> +: 0) {}
> >>>>> +
> >>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned baseOut) {
> >>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> >>>> scratch).addReg(
> >>>>> + baseOut,
> >>>> RegState::Define).addReg(baseReg).addReg(0).addImm(UnitSize));
> >>>>> + return scratch;
> >>>>> + }
> >>>>> +
> >>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg,
> >>>>> + unsigned
> >>> baseOut) {
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> >>>> baseOut).addReg(
> >>>>> + storeReg).addReg(baseReg).addReg(0).addImm(UnitSize));
> >>>>> + }
> >>>>> +
> >>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned baseOut) {
> >>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> + TII->get(ARM::LDRB_POST_IMM),
> >>>> scratch)
> >>>>> + .addReg(baseOut,
> >>> RegState::Define).addReg(baseReg)
> >>>>> + .addReg(0).addImm(1));
> >>>>> + return scratch;
> >>>>> + }
> >>>>> +
> >>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg,
> >>>>> + unsigned
> >>> baseOut) {
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> + TII->get(ARM::STRB_POST_IMM),
> >>>> baseOut)
> >>>>> +
> >>>>> + .addReg(storeReg).addReg(baseReg).addReg(0).addImm(1));
> >>>>> + }
> >>>>> +
> >>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
> >> *MI,
> >>>>> + DebugLoc &dl, unsigned Constant,
> >>>>> + const DataLayout *DL) {
> >>>>> + unsigned constReg = MRI.createVirtualRegister(TRC);
> >>>>> + unsigned Idx = getConstantPoolIndex(BB->getParent(), DL,
> >> Constant);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII-
> >get(ARM::LDRcp)).addReg(
> >>>>> + constReg,
> >>> RegState::Define).addConstantPoolIndex(Idx).addImm(0));
> >>>>> + return constReg;
> >>>>> + }
> >>>>> +
> >>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >> DebugLoc
> >>>> &dl,
> >>>>> + unsigned InReg, unsigned OutReg) {
> >>>>> + MachineInstrBuilder MIB =
> >>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::SUBri), OutReg);
> >>>>> +
> >>>>
> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> >>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>> + MIB->getOperand(5).setIsDef(true);
> >>>>> + }
> >>>>> +
> >>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + MachineBasicBlock *TargetBB) {
> >>>>> + BuildMI(*BB, MI, dl, TII-
> >>>>> get(ARM::Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> >>>>> + .addReg(ARM::CPSR);
> >>>>> + }
> >>>>> +
> >>>>> +private:
> >>>>> + const unsigned UnitSize;
> >>>>> + const unsigned UnitLdOpc;
> >>>>> + const unsigned UnitStOpc;
> >>>>> +};
> >>>>> +
> >>>>> +class Thumb2StructByvalEmitter : public TargetStructByvalEmitter
> >>>>> +{
> >>>>> +public:
> >>>>> + Thumb2StructByvalEmitter(const TargetInstrInfo *TII,
> >>>> MachineRegisterInfo &MRI,
> >>>>> + unsigned LoadStoreSize)
> >>>>> + : TargetStructByvalEmitter(
> >>>>> + TII, MRI, (const TargetRegisterClass
> > *)&ARM::tGPRRegClass),
> >>>>> + UnitSize(LoadStoreSize),
> >>>>> + UnitLdOpc(LoadStoreSize == 4
> >>>>> + ? ARM::t2LDR_POST
> >>>>> + : LoadStoreSize == 2
> >>>>> + ? ARM::t2LDRH_POST
> >>>>> + : LoadStoreSize == 1 ? ARM::t2LDRB_POST
:
> >>> 0),
> >>>>> + UnitStOpc(LoadStoreSize == 4
> >>>>> + ? ARM::t2STR_POST
> >>>>> + : LoadStoreSize == 2
> >>>>> + ? ARM::t2STRH_POST
> >>>>> + : LoadStoreSize == 1 ? ARM::t2STRB_POST
:
> >>>>> +0) {}
> >>>>> +
> >>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned baseOut) {
> >>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> >>>> scratch).addReg(
> >>>>> + baseOut,
> RegState::Define).addReg(baseReg).addImm(UnitSize));
> >>>>> + return scratch;
> >>>>> + }
> >>>>> +
> >>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg,
> >>>>> + unsigned
> >>> baseOut) {
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> >>>>> + baseOut)
> >>>>> +
> >>>>> + .addReg(storeReg).addReg(baseReg).addImm(UnitSize));
> >>>>> + }
> >>>>> +
> >>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned baseOut) {
> >>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> + TII->get(ARM::t2LDRB_POST),
> >>>> scratch)
> >>>>> + .addReg(baseOut,
> >>> RegState::Define).addReg(baseReg)
> >>>>> + .addImm(1));
> >>>>> + return scratch;
> >>>>> + }
> >>>>> +
> >>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg,
> >>>>> + unsigned
> >>> baseOut) {
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> + TII->get(ARM::t2STRB_POST),
> >>>> baseOut)
> >>>>> +
> >>>>> + .addReg(storeReg).addReg(baseReg).addImm(1));
> >>>>> + }
> >>>>> +
> >>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
> >> *MI,
> >>>>> + DebugLoc &dl, unsigned Constant,
> >>>>> + const DataLayout *DL) {
> >>>>> + unsigned VConst = MRI.createVirtualRegister(TRC);
> >>>>> + unsigned Vtmp = VConst;
> >>>>> + if ((Constant & 0xFFFF0000) != 0)
> >>>>> + Vtmp = MRI.createVirtualRegister(TRC);
> >>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16), Vtmp)
> >>>>> + .addImm(Constant & 0xFFFF));
> >>>>> +
> >>>>> + if ((Constant & 0xFFFF0000) != 0)
> >>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> > VConst)
> >>>>> + .addReg(Vtmp).addImm(Constant >> 16));
> >>>>> + return VConst;
> >>>>> + }
> >>>>> +
> >>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >> DebugLoc
> >>>> &dl,
> >>>>> + unsigned InReg, unsigned OutReg) {
> >>>>> + MachineInstrBuilder MIB =
> >>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::t2SUBri), OutReg);
> >>>>> +
> >>>>
> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> >>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>> + MIB->getOperand(5).setIsDef(true);
> >>>>> + }
> >>>>> +
> >>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + MachineBasicBlock *TargetBB) {
> >>>>> + BuildMI(BB, dl, TII-
> >>>>> get(ARM::t2Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> >>>>> + .addReg(ARM::CPSR);
> >>>>> + }
> >>>>> +
> >>>>> +private:
> >>>>> + const unsigned UnitSize;
> >>>>> + const unsigned UnitLdOpc;
> >>>>> + const unsigned UnitStOpc;
> >>>>> +};
> >>>>> +
> >>>>> +// This class is a thin wrapper that delegates most of the work
> >>>>> +to the correct // TargetStructByvalEmitter implementation. It
> >>>>> +also handles the lowering for // targets that support neon
> >>>>> +because the neon implementation is the same for all // targets that
> support it.
> >>>>> +class StructByvalEmitter {
> >>>>> +public:
> >>>>> + StructByvalEmitter(unsigned LoadStoreSize, const ARMSubtarget
> >>>> *Subtarget,
> >>>>> + const TargetInstrInfo *TII_,
> >>>>> + MachineRegisterInfo
> >>> &MRI_,
> >>>>> + const DataLayout *DL_)
> >>>>> + : UnitSize(LoadStoreSize),
> >>>>> + TargetEmitter(
> >>>>> + Subtarget->isThumb2()
> >>>>> + ? static_cast<TargetStructByvalEmitter *>(
> >>>>> + new Thumb2StructByvalEmitter(TII_, MRI_,
> >>>>> + LoadStoreSize))
> >>>>> + : static_cast<TargetStructByvalEmitter *>(
> >>>>> + new ARMStructByvalEmitter(TII_, MRI_,
> >>>>> + LoadStoreSize))),
> >>>>> + TII(TII_), MRI(MRI_), DL(DL_),
> >>>>> + VecTRC(UnitSize == 16
> >>>>> + ? (const TargetRegisterClass
*)&ARM::DPairRegClass
> >>>>> + : UnitSize == 8
> >>>>> + ? (const TargetRegisterClass
> >>> *)&ARM::DPRRegClass
> >>>>> + : 0),
> >>>>> + VecLdOpc(UnitSize == 16 ? ARM::VLD1q32wb_fixed
> >>>>> + : UnitSize == 8 ?
> >>>>> + ARM::VLD1d32wb_fixed
> >>> : 0),
> >>>>> + VecStOpc(UnitSize == 16 ? ARM::VST1q32wb_fixed
> >>>>> + : UnitSize == 8 ?
> >>>>> +ARM::VST1d32wb_fixed : 0) {}
> >>>>> +
> >>>>> + // Emit a post-increment load of "unit" size. The unit size is
> >>>>> + based on the // alignment of the struct being copied (16, 8, 4,
> >>>>> + 2, or 1 bytes). Loads of 16 // or 8 bytes use NEON instructions
> >>>>> + to load
> >>> the
> >>>> value.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to load.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + If baseOut // is 0 then a new register is created to hold the
> >>> incremented
> >>>> address.
> >>>>> + // \returns a pair of registers holding the loaded value and
> >>>>> + the updated // address.
> >>>>> + std::pair<unsigned, unsigned> emitUnitLoad(MachineBasicBlock
> *BB,
> >>>>> + MachineInstr *MI,
> >>>>> + DebugLoc
> >>> &dl,
> >>>>> + unsigned baseReg,
> >>>>> + unsigned baseOut = 0)
{
> >>>>> + unsigned scratch = 0;
> >>>>> + if (baseOut == 0)
> >>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter->getTRC());
> >>>>> + if (UnitSize >= 8) { // neon
> >>>>> + scratch = MRI.createVirtualRegister(VecTRC);
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecLdOpc),
> >>>> scratch).addReg(
> >>>>> + baseOut, RegState::Define).addReg(baseReg).addImm(0));
> >>>>> + } else {
> >>>>> + scratch = TargetEmitter->emitUnitLoad(BB, MI, dl, baseReg,
> >>> baseOut);
> >>>>> + }
> >>>>> + return std::make_pair(scratch, baseOut); }
> >>>>> +
> >>>>> + // Emit a post-increment store of "unit" size. The unit size is
> >>>>> + based on the // alignment of the struct being copied (16, 8, 4,
> >>>>> + 2, or 1 bytes). Stores of // 16 or 8 bytes use NEON
> >>>>> + instructions to
> >>> store the
> >>>> value.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to store.
> >>>>> + // \param storeReg the register holding the value to store.
> >>>>> + // \param baseOut the register to recieve the incremented
> address.
> >>>>> + If baseOut // is 0 then a new register is created to hold the
> >>> incremented
> >>>> address.
> >>>>> + // \returns the register holding the updated address.
> >>>>> + unsigned emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg,
> >>>>> + unsigned baseOut = 0) {
> >>>>> + if (baseOut == 0)
> >>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter->getTRC());
> >>>>> + if (UnitSize >= 8) { // neon
> >>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecStOpc),
> > baseOut)
> >>>>> +
> > .addReg(baseReg).addImm(0).addReg(storeReg));
> >>>>> + } else {
> >>>>> + TargetEmitter->emitUnitStore(BB, MI, dl, baseReg, storeReg,
> >>>> baseOut);
> >>>>> + }
> >>>>> + return baseOut;
> >>>>> + }
> >>>>> +
> >>>>> + // Emit a post-increment load of one byte.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to load.
> >>>>> + // \returns a pair of registers holding the loaded value and
> >>>>> + the updated // address.
> >>>>> + std::pair<unsigned, unsigned> emitByteLoad(MachineBasicBlock
> *BB,
> >>>>> + MachineInstr *MI,
> >>>>> + DebugLoc
> >>> &dl,
> >>>>> + unsigned baseReg) {
> >>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> >>>>> getTRC());
> >>>>> + unsigned scratch =
> >>>>> + TargetEmitter->emitByteLoad(BB, MI, dl, baseReg, baseOut);
> >>>>> + return std::make_pair(scratch, baseOut); }
> >>>>> +
> >>>>> + // Emit a post-increment store of one byte.
> >>>>> + //
> >>>>> + // \param baseReg the register holding the address to store.
> >>>>> + // \param storeReg the register holding the value to store.
> >>>>> + // \returns the register holding the updated address.
> >>>>> + unsigned emitByteStore(MachineBasicBlock *BB, MachineInstr
> *MI,
> >>>> DebugLoc &dl,
> >>>>> + unsigned baseReg, unsigned storeReg) {
> >>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> >>>>> getTRC());
> >>>>> + TargetEmitter->emitByteStore(BB, MI, dl, baseReg, storeReg,
> >>> baseOut);
> >>>>> + return baseOut;
> >>>>> + }
> >>>>> +
> >>>>> + // Emit a load of the constant LoopSize.
> >>>>> + //
> >>>>> + // \param LoopSize the constant to load.
> >>>>> + // \returns the register holding the loaded constant.
> >>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
> >> *MI,
> >>>>> + DebugLoc &dl, unsigned LoopSize) {
> >>>>> + return TargetEmitter->emitConstantLoad(BB, MI, dl, LoopSize,
> >>>>> + DL); }
> >>>>> +
> >>>>> + // Emit a subtract of a register minus immediate, with the
> >>>>> + immediate equal to // the "unit" size. The unit size is based
> >>>>> + on the alignment of the struct // being copied (16, 8, 4, 2, or
> >>>>> + 1
> >>> bytes).
> >>>>> + //
> >>>>> + // \param InReg the register holding the initial value.
> >>>>> + // \param OutReg the register to recieve the subtracted value.
> >>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >> DebugLoc
> >>>> &dl,
> >>>>> + unsigned InReg, unsigned OutReg) {
> >>>>> + TargetEmitter->emitSubImm(BB, MI, dl, InReg, OutReg); }
> >>>>> +
> >>>>> + // Emit a branch based on a condition code of not equal.
> >>>>> + //
> >>>>> + // \param TargetBB the destination of the branch.
> >>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc &dl,
> >>>>> + MachineBasicBlock *TargetBB) {
> >>>>> + TargetEmitter->emitBranchNE(BB, MI, dl, TargetBB); }
> >>>>> +
> >>>>> + // Return the register class used by the subtarget.
> >>>>> + //
> >>>>> + // \returns the target register class.
> >>>>> + const TargetRegisterClass *getTRC() const { return
> >>>>> + TargetEmitter->getTRC(); }
> >>>>> +
> >>>>> +private:
> >>>>> + const unsigned UnitSize;
> >>>>> + OwningPtr<TargetStructByvalEmitter> TargetEmitter;
> >>>>> + const TargetInstrInfo *TII;
> >>>>> + MachineRegisterInfo &MRI;
> >>>>> + const DataLayout *DL;
> >>>>> +
> >>>>> + const TargetRegisterClass *VecTRC;
> >>>>> + const unsigned VecLdOpc;
> >>>>> + const unsigned VecStOpc;
> >>>>> +};
> >>>>> +}
> >>>>> +
> >>>>> +MachineBasicBlock *
> >>>>> +ARMTargetLowering::EmitStructByval(MachineInstr *MI,
> >>>>> + MachineBasicBlock *BB) const {
> >>>>> // This pseudo instruction has 3 operands: dst, src, size // We
> >>>>> expand it to a loop if size > Subtarget-
> >>>>> getMaxInlineSizeThreshold().
> >>>>> // Otherwise, we will generate unrolled scalar copies.
> >>>>> @@ -7261,23 +7684,13 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >>>>> unsigned Align = MI->getOperand(3).getImm(); DebugLoc dl =
> >>>>> MI->getDebugLoc();
> >>>>>
> >>>>> - bool isThumb2 = Subtarget->isThumb2(); MachineFunction *MF =
> >>>>> BB->getParent(); MachineRegisterInfo &MRI = MF->getRegInfo();
> >>>>> - unsigned ldrOpc, strOpc, UnitSize = 0;
> >>>>> -
> >>>>> - const TargetRegisterClass *TRC = isThumb2 ?
> >>>>> - (const TargetRegisterClass*)&ARM::tGPRRegClass :
> >>>>> - (const TargetRegisterClass*)&ARM::GPRRegClass;
> >>>>> - const TargetRegisterClass *TRC_Vec = 0;
> >>>>> + unsigned UnitSize = 0;
> >>>>>
> >>>>> if (Align & 1) {
> >>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
> >>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
> >>>>> UnitSize = 1;
> >>>>> } else if (Align & 2) {
> >>>>> - ldrOpc = isThumb2 ? ARM::t2LDRH_POST : ARM::LDRH_POST;
> >>>>> - strOpc = isThumb2 ? ARM::t2STRH_POST : ARM::STRH_POST;
> >>>>> UnitSize = 2;
> >>>>> } else {
> >>>>> // Check whether we can use NEON instructions.
> >>>>> @@ -7285,27 +7698,18 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >>>>> hasAttribute(AttributeSet::FunctionIndex,
> >>>>> Attribute::NoImplicitFloat) &&
> >>>>> Subtarget->hasNEON()) {
> >>>>> - if ((Align % 16 == 0) && SizeVal >= 16) {
> >>>>> - ldrOpc = ARM::VLD1q32wb_fixed;
> >>>>> - strOpc = ARM::VST1q32wb_fixed;
> >>>>> + if ((Align % 16 == 0) && SizeVal >= 16)
> >>>>> UnitSize = 16;
> >>>>> - TRC_Vec = (const TargetRegisterClass*)&ARM::DPairRegClass;
> >>>>> - }
> >>>>> - else if ((Align % 8 == 0) && SizeVal >= 8) {
> >>>>> - ldrOpc = ARM::VLD1d32wb_fixed;
> >>>>> - strOpc = ARM::VST1d32wb_fixed;
> >>>>> + else if ((Align % 8 == 0) && SizeVal >= 8)
> >>>>> UnitSize = 8;
> >>>>> - TRC_Vec = (const TargetRegisterClass*)&ARM::DPRRegClass;
> >>>>> - }
> >>>>> }
> >>>>> // Can't use NEON instructions.
> >>>>> - if (UnitSize == 0) {
> >>>>> - ldrOpc = isThumb2 ? ARM::t2LDR_POST : ARM::LDR_POST_IMM;
> >>>>> - strOpc = isThumb2 ? ARM::t2STR_POST : ARM::STR_POST_IMM;
> >>>>> + if (UnitSize == 0)
> >>>>> UnitSize = 4;
> >>>>> - }
> >>>>> }
> >>>>>
> >>>>> + StructByvalEmitter ByvalEmitter(UnitSize, Subtarget, TII, MRI,
> >>>>> + getDataLayout());
> >>>>> unsigned BytesLeft = SizeVal % UnitSize; unsigned LoopSize =
> >>>>> SizeVal - BytesLeft;
> >>>>>
> >>>>> @@ -7316,67 +7720,22 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >>>>> unsigned srcIn = src;
> >>>>> unsigned destIn = dest;
> >>>>> for (unsigned i = 0; i < LoopSize; i+=UnitSize) {
> >>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
> >>>> TRC_Vec:TRC);
> >>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>> - if (UnitSize >= 8) {
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> - TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcOut,
RegState::Define).addReg(srcIn).addImm(0));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > destOut)
> >>>>> - .addReg(destIn).addImm(0).addReg(scratch));
> >>>>> - } else if (isThumb2) {
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> - TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcOut,
> >>> RegState::Define).addReg(srcIn).addImm(UnitSize));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addImm(UnitSize));
> >>>>> - } else {
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> - TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn).addReg(0)
> >>>>> - .addImm(UnitSize));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addReg(0).addImm(UnitSize));
> >>>>> - }
> >>>>> - srcIn = srcOut;
> >>>>> - destIn = destOut;
> >>>>> + std::pair<unsigned, unsigned> res =
> >>>>> + ByvalEmitter.emitUnitLoad(BB, MI, dl, srcIn);
> >>>>> + unsigned scratch = res.first;
> >>>>> + srcIn = res.second;
> >>>>> + destIn = ByvalEmitter.emitUnitStore(BB, MI, dl, destIn,
> >>>>> + scratch);
> >>>>> }
> >>>>>
> >>>>> // Handle the leftover bytes with LDRB and STRB.
> >>>>> // [scratch, srcOut] = LDRB_POST(srcIn, 1)
> >>>>> // [destOut] = STRB_POST(scratch, destIn, 1)
> >>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
> >>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
> >>>>> for (unsigned i = 0; i < BytesLeft; i++) {
> >>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>> - if (isThumb2) {
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> - TII->get(ldrOpc),scratch)
> >>>>> - .addReg(srcOut,
RegState::Define).addReg(srcIn).addImm(1));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addImm(1));
> >>>>> - } else {
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>> - TII->get(ldrOpc),scratch)
> >>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn)
> >>>>> - .addReg(0).addImm(1));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addReg(0).addImm(1));
> >>>>> - }
> >>>>> - srcIn = srcOut;
> >>>>> - destIn = destOut;
> >>>>> + std::pair<unsigned, unsigned> res =
> >>>>> + ByvalEmitter.emitByteLoad(BB, MI, dl, srcIn);
> >>>>> + unsigned scratch = res.first;
> >>>>> + srcIn = res.second;
> >>>>> + destIn = ByvalEmitter.emitByteStore(BB, MI, dl, destIn,
> >>>>> + scratch);
> >>>>> }
> >>>>> MI->eraseFromParent(); // The instruction is gone now.
> >>>>> return BB;
> >>>>> @@ -7414,34 +7773,7 @@ EmitStructByval(MachineInstr *MI, Machin
> >>>>> exitMBB->transferSuccessorsAndUpdatePHIs(BB);
> >>>>>
> >>>>> // Load an immediate to varEnd.
> >>>>> - unsigned varEnd = MRI.createVirtualRegister(TRC);
> >>>>> - if (isThumb2) {
> >>>>> - unsigned VReg1 = varEnd;
> >>>>> - if ((LoopSize & 0xFFFF0000) != 0)
> >>>>> - VReg1 = MRI.createVirtualRegister(TRC);
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16), VReg1)
> >>>>> - .addImm(LoopSize & 0xFFFF));
> >>>>> -
> >>>>> - if ((LoopSize & 0xFFFF0000) != 0)
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> > varEnd)
> >>>>> - .addReg(VReg1)
> >>>>> - .addImm(LoopSize >> 16));
> >>>>> - } else {
> >>>>> - MachineConstantPool *ConstantPool = MF->getConstantPool();
> >>>>> - Type *Int32Ty =
> > Type::getInt32Ty(MF->getFunction()->getContext());
> >>>>> - const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> >>>>> -
> >>>>> - // MachineConstantPool wants an explicit alignment.
> >>>>> - unsigned Align = getDataLayout()-
> >getPrefTypeAlignment(Int32Ty);
> >>>>> - if (Align == 0)
> >>>>> - Align = getDataLayout()->getTypeAllocSize(C->getType());
> >>>>> - unsigned Idx = ConstantPool->getConstantPoolIndex(C, Align);
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::LDRcp))
> >>>>> - .addReg(varEnd, RegState::Define)
> >>>>> - .addConstantPoolIndex(Idx)
> >>>>> - .addImm(0));
> >>>>> - }
> >>>>> + unsigned varEnd = ByvalEmitter.emitConstantLoad(BB, MI, dl,
> >>>>> + LoopSize);
> >>>>> BB->addSuccessor(loopMBB);
> >>>>>
> >>>>> // Generate the loop body:
> >>>>> @@ -7450,12 +7782,12 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >>>>> // destPhi = PHI(destLoop, dst)
> >>>>> MachineBasicBlock *entryBB = BB;
> >>>>> BB = loopMBB;
> >>>>> - unsigned varLoop = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned varPhi = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned srcLoop = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned srcPhi = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned destLoop = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned destPhi = MRI.createVirtualRegister(TRC);
> >>>>> + unsigned varLoop =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>> + unsigned varPhi =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>> + unsigned srcLoop =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>> + unsigned srcPhi =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>> + unsigned destLoop =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>> + unsigned destPhi =
> >>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>
> >>>>> BuildMI(*BB, BB->begin(), dl, TII->get(ARM::PHI), varPhi)
> >>>>> .addReg(varLoop).addMBB(loopMBB) @@ -7469,39 +7801,16 @@
> >>>>> EmitStructByval(MachineInstr *MI, Machin
> >>>>>
> >>>>> // [scratch, srcLoop] = LDR_POST(srcPhi, UnitSize)
> >>>>> // [destLoop] = STR_POST(scratch, destPhi, UnitSiz)
> >>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
> >>>>> TRC_Vec:TRC);
> >>>>> - if (UnitSize >= 8) {
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcLoop, RegState::Define).addReg(srcPhi).addImm(0));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>> - .addReg(destPhi).addImm(0).addReg(scratch));
> >>>>> - } else if (isThumb2) {
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcLoop,
> >>>> RegState::Define).addReg(srcPhi).addImm(UnitSize));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>> - .addReg(scratch).addReg(destPhi)
> >>>>> - .addImm(UnitSize));
> >>>>> - } else {
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>> - .addReg(srcLoop, RegState::Define).addReg(srcPhi).addReg(0)
> >>>>> - .addImm(UnitSize));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>> - .addReg(scratch).addReg(destPhi)
> >>>>> - .addReg(0).addImm(UnitSize));
> >>>>> + {
> >>>>> + std::pair<unsigned, unsigned> res =
> >>>>> + ByvalEmitter.emitUnitLoad(BB, BB->end(), dl, srcPhi,
> > srcLoop);
> >>>>> + unsigned scratch = res.first;
> >>>>> + ByvalEmitter.emitUnitStore(BB, BB->end(), dl, destPhi,
> >>>>> + scratch, destLoop);
> >>>>> }
> >>>>>
> >>>>> // Decrement loop variable by UnitSize.
> >>>>> - MachineInstrBuilder MIB = BuildMI(BB, dl,
> >>>>> - TII->get(isThumb2 ? ARM::t2SUBri : ARM::SUBri), varLoop);
> >>>>> -
> >>>>>
> >> AddDefaultCC(AddDefaultPred(MIB.addReg(varPhi).addImm(UnitSize)));
> >>>>> - MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>> - MIB->getOperand(5).setIsDef(true);
> >>>>> -
> >>>>> - BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc))
> >>>>> - .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR);
> >>>>> + ByvalEmitter.emitSubImm(BB, BB->end(), dl, varPhi, varLoop);
> >>>>> + ByvalEmitter.emitBranchNE(BB, BB->end(), dl, loopMBB);
> >>>>>
> >>>>> // loopMBB can loop back to loopMBB or fall through to exitMBB.
> >>>>> BB->addSuccessor(loopMBB);
> >>>>> @@ -7510,36 +7819,17 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >> //
> >>>>> Add epilogue to handle BytesLeft.
> >>>>> BB = exitMBB;
> >>>>> MachineInstr *StartOfExit = exitMBB->begin();
> >>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
> >>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
> >>>>>
> >>>>> // [scratch, srcOut] = LDRB_POST(srcLoop, 1)
> >>>>> // [destOut] = STRB_POST(scratch, destLoop, 1)
> >>>>> unsigned srcIn = srcLoop;
> >>>>> unsigned destIn = destLoop;
> >>>>> for (unsigned i = 0; i < BytesLeft; i++) {
> >>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>> - if (isThumb2) {
> >>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> >>>>> - TII->get(ldrOpc),scratch)
> >>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn).addImm(1));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
TII->get(strOpc),
> >>>> destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addImm(1));
> >>>>> - } else {
> >>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> >>>>> - TII->get(ldrOpc),scratch)
> >>>>> - .addReg(srcOut,
> >>>> RegState::Define).addReg(srcIn).addReg(0).addImm(1));
> >>>>> -
> >>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
TII->get(strOpc),
> >>>> destOut)
> >>>>> - .addReg(scratch).addReg(destIn)
> >>>>> - .addReg(0).addImm(1));
> >>>>> - }
> >>>>> - srcIn = srcOut;
> >>>>> - destIn = destOut;
> >>>>> + std::pair<unsigned, unsigned> res =
> >>>>> + ByvalEmitter.emitByteLoad(BB, StartOfExit, dl, srcIn);
> >>>>> + unsigned scratch = res.first;
> >>>>> + srcIn = res.second;
> >>>>> + destIn = ByvalEmitter.emitByteStore(BB, StartOfExit, dl,
> >>>>> + destIn, scratch);
> >>>>> }
> >>>>>
> >>>>> MI->eraseFromParent(); // The instruction is gone now.
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> llvm-commits mailing list
> >>>>> llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>>
> >
> >
More information about the llvm-commits
mailing list