[llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
Bob Wilson
bob.wilson at apple.com
Wed Oct 23 12:46:37 PDT 2013
On Oct 23, 2013, at 12:32 PM, Manman Ren <manman.ren at gmail.com> wrote:
> Hi David,
>
> Can we use helper functions for emitUnitLoad|Store emitByteLoad|Store to reduce code duplication?
>
> Thanks,
> Manman
Yes, that would be great, if it works.
The only other thing I noticed is that this patch have several places where every it creates a scratch virtual register on every branch of a conditional. Those could be moved outside the conditionals.
>
>
> On Tue, Oct 22, 2013 at 3:50 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> I've attached a patch that removes the class in place of inline
> conditionals. Please help to review this patch.
>
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
>
>
> > -----Original Message-----
> > From: Bob Wilson [mailto:bob.wilson at apple.com]
> > Sent: Tuesday, October 22, 2013 9:15 AM
> > To: David Peixotto
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
> >
> > Thanks. At least try it out.. If it turns out to be unreadable with all
> the
> > conditionals, we can reconsider.
> >
> > On Oct 22, 2013, at 9:13 AM, David Peixotto <dpeixott at codeaurora.org>
> > wrote:
> >
> > > Ok, I will make a patch to switch to using inline conditionals instead.
> > >
> > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > > hosted by The Linux Foundation
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Bob Wilson [mailto:bob.wilson at apple.com]
> > >> Sent: Monday, October 21, 2013 9:18 PM
> > >> To: David Peixotto
> > >> Cc: llvm-commits at cs.uiuc.edu LLVM
> > >> Subject: Re: [llvm] r192915 - Refactor lowering for
> > >> COPY_STRUCT_BYVAL_I32
> > >>
> > >>
> > >> On Oct 21, 2013, at 7:20 PM, David Peixotto <dpeixott at codeaurora.org>
> > >> wrote:
> > >>
> > >>> Hi Bob,
> > >>>
> > >>> I agree that a generic emitter would be useful, but I'm not sure I
> > >>> would get the time to work on such a project at this point.
> > >>
> > >> In that case, I think it would be better to simplify this code to use
> > >> a
> > > more
> > >> direct approach with conditionals.
> > >>
> > >>>
> > >>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > >>> Forum, hosted by The Linux Foundation
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
> > >>>> Sent: Monday, October 21, 2013 12:12 PM
> > >>>> To: David Peixotto
> > >>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>> Subject: Re: [llvm] r192915 - Refactor lowering for
> > >>>> COPY_STRUCT_BYVAL_I32
> > >>>>
> > >>>> It seems like it make more sense to make the StructByvalEmitter a
> > >>>> generic "emitter" for ARM instructions. I suspect there are a
> > >>>> number of other
> > >>> places
> > >>>> in ARMISelLowering that could make good use of it. Would you be
> > >>>> willing
> > >>> to
> > >>>> investigate that?
> > >>>>
> > >>>> As it stands, this really does seem like overkill for the struct
> > >>>> byval
> > >>> issue, but if
> > >>>> you could make it more generally useful, then the extra code would
> > >>>> be worthwhile.
> > >>>>
> > >>>> On Oct 21, 2013, at 10:02 AM, David Peixotto
> > >>>> <dpeixott at codeaurora.org>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Bob,
> > >>>>>
> > >>>>> I think your criticism is valid here. I wasn't too happy with how
> > >>>>> much code I ended up writing for this change. When I started I
> > >>>>> thought the code size would be about equal after implementing the
> > >>>>> thumb1 lowering because I was getting rid of some code
> > >>>>> duplication, but the code size for abstracting the common parts
> > >>>>> was larger than I
> > > had
> > >> anticipated.
> > >>>>> There is no fundamental problem with implementing it with
> > >>>>> conditionals, I did it this way because I thought it would be
> > >>>>> clearer
> > >>> and
> > >>>> would be easier to write correctly.
> > >>>>>
> > >>>>> I think the way it is now has the advantage that the lowering
> > >>>>> algorithm is clearly separated from the details of generating
> > >>>>> machine instructions for each sub-target. I think it would be
> > >>>>> easier to improve the algorithm with the way it is now. For
> > >>>>> example, we are always using byte stores to copy any leftover that
> > >>>>> does not fit into the "unit" size. So if we have a 31-byte struct
> > >>>>> and a target that supports neon we will generate a 16-byte store
> > >>>>> and
> > >>>>> 15 1-byte stores. We could improve this by generating fewer stores
> > >>>>> for the leftover (3x4-byte + 1x2byte + 1x1byte). I don't know if
> > >>>>> we actually care about this kind of change, but I believe it would
> > >>>>> be
> > >>> easier to
> > >>>> make now.
> > >>>>>
> > >>>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > >>>>> Forum, hosted by The Linux Foundation
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
> > >>>>>> Sent: Saturday, October 19, 2013 7:00 PM
> > >>>>>> To: David Peixotto
> > >>>>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>>>> Subject: Re: [llvm] r192915 - Refactor lowering for
> > >>>>>> COPY_STRUCT_BYVAL_I32
> > >>>>>>
> > >>>>>> This is very nice and elegant, but it's an awful lot of code for
> > >>>>>> something
> > >>>>> that
> > >>>>>> isn't really that complicated. It seems like overkill to me.
> > >>>>>> Did you
> > >>>>> consider
> > >>>>>> implementing the Thumb1 support by just adding more
> > conditionals?
> > >>>>>> Is there a fundamental problem with that?
> > >>>>>>
> > >>>>>> On Oct 17, 2013, at 12:49 PM, David Peixotto
> > >>>>>> <dpeixott at codeaurora.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Author: dpeixott
> > >>>>>>> Date: Thu Oct 17 14:49:22 2013
> > >>>>>>> New Revision: 192915
> > >>>>>>>
> > >>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=192915&view=rev
> > >>>>>>> Log:
> > >>>>>>> Refactor lowering for COPY_STRUCT_BYVAL_I32
> > >>>>>>>
> > >>>>>>> This commit refactors the lowering of the
> > COPY_STRUCT_BYVAL_I32
> > >>>>>>> pseudo-instruction in the ARM backend. We introduce a new
> > helper
> > >>>>>>> class that encapsulates all of the operations needed during the
> > >>> lowering.
> > >>>>>>> The operations are implemented for each subtarget in different
> > >>>>>>> subclasses. Currently only arm and thumb2 subtargets are
> > supported.
> > >>>>>>>
> > >>>>>>> This refactoring was done to easily implement support for thumb1
> > >>>>>>> subtargets. This initial patch does not add support for thumb1,
> > >>>>>>> but is only a refactoring. A follow on patch will implement the
> > >>>>>>> support for
> > >>>>>>> thumb1 subtargets.
> > >>>>>>>
> > >>>>>>> No intended functionality change.
> > >>>>>>>
> > >>>>>>> Modified:
> > >>>>>>> llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> > >>>>>>>
> > >>>>>>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> > >>>>>>> URL:
> > >>>>>>> http://llvm.org/viewvc/llvm-
> > >>>> project/llvm/trunk/lib/Target/ARM/ARMISe
> > >>>>>>> lL owering.cpp?rev=192915&r1=192914&r2=192915&view=diff
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> > ==========================================================
> > >>>>>> ============
> > >>>>>>> ========
> > >>>>>>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
> > >>>>>>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Thu Oct 17
> > >>>>>>> +++ 14:49:22
> > >>>>>>> +++ 2013
> > >>>>>>> @@ -48,6 +48,7 @@
> > >>>>>>> #include "llvm/Support/MathExtras.h"
> > >>>>>>> #include "llvm/Support/raw_ostream.h"
> > >>>>>>> #include "llvm/Target/TargetOptions.h"
> > >>>>>>> +#include <utility>
> > >>>>>>> using namespace llvm;
> > >>>>>>>
> > >>>>>>> STATISTIC(NumTailCalls, "Number of tail calls"); @@ -7245,8
> > >>>>>>> +7246,430 @@ MachineBasicBlock *OtherSucc(MachineBasi
> > >>>>>>> llvm_unreachable("Expecting a BB with two successors!"); }
> > >>>>>>>
> > >>>>>>> -MachineBasicBlock *ARMTargetLowering::
> > >>>>>>> -EmitStructByval(MachineInstr *MI, MachineBasicBlock *BB) const
> > >>>>>>> {
> > >>>>>>> +namespace {
> > >>>>>>> +// This class is a helper for lowering the
> > >>>>>>> +COPY_STRUCT_BYVAL_I32
> > >>>>>> instruction.
> > >>>>>>> +// It defines the operations needed to lower the byval copy. We
> > >>>>>>> +use a helper // class because the opcodes and machine
> > >>>>>>> +instructions are different for each // subtarget, but the
> > >>>>>>> +overall algorithm for the lowering is the same. The //
> > >>>>>>> +implementation of each operation will be defined separately for
> > >>>>>>> +arm, thumb1, // and
> > >>>>>>> +thumb2 targets by subclassing this base class. See //
> > >>>>> ARMTargetLowering::EmitStructByval()
> > >>>>>> for how these operations are used.
> > >>>>>>> +class TargetStructByvalEmitter {
> > >>>>>>> +public:
> > >>>>>>> + TargetStructByvalEmitter(const TargetInstrInfo *TII_,
> > >>>>>>> + MachineRegisterInfo &MRI_,
> > >>>>>>> + const TargetRegisterClass *TRC_)
> > >>>>>>> + : TII(TII_), MRI(MRI_), TRC(TRC_) {}
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment load of "unit" size. The unit size
> > >>>>>>> + is based on the // alignment of the struct being copied (4,
> > >>>>>>> + 2, or
> > >>>>>>> + 1 bytes). Alignments higher // than 4 are handled separately
> > >>>>>>> + by using
> > >>>>>> NEON instructions.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to load.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + // \returns the register holding the loaded value.
> > >>>>>>> + virtual unsigned emitUnitLoad(MachineBasicBlock *BB,
> > >>>>>>> + MachineInstr
> > >>>>>> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned baseReg,
> > >>>>>>> + unsigned baseOut) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment store of "unit" size. The unit size
> > >>>>>>> + is based on the // alignment of the struct being copied (4,
> > >>>>>>> + 2, or
> > >>>>>>> + 1 bytes). Alignments higher // than 4 are handled separately
> > >>>>>>> + by using
> > >>>>>> NEON instructions.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to store.
> > >>>>>>> + // \param storeReg the register holding the value to store.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + virtual void emitUnitStore(MachineBasicBlock *BB,
> > >>>>>>> + MachineInstr
> > >> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned baseReg,
> > >>>>>>> + unsigned
> > >>>>> storeReg,
> > >>>>>>> + unsigned baseOut) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment load of one byte.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to load.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + // \returns the register holding the loaded value.
> > >>>>>>> + virtual unsigned emitByteLoad(MachineBasicBlock *BB,
> > >>>>>>> + MachineInstr
> > >>>>>> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned baseReg,
> > >>>>>>> + unsigned baseOut) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment store of one byte.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to store.
> > >>>>>>> + // \param storeReg the register holding the value to store.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + virtual void emitByteStore(MachineBasicBlock *BB,
> > >>>>>>> + MachineInstr
> > >> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned baseReg,
> > >>>>>>> + unsigned
> > >>>>> storeReg,
> > >>>>>>> + unsigned baseOut) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a load of a constant value.
> > >>>>>>> + //
> > >>>>>>> + // \param Constant the register holding the address to store.
> > >>>>>>> + // \returns the register holding the loaded value.
> > >>>>>>> + virtual unsigned emitConstantLoad(MachineBasicBlock *BB,
> > >>>>>> MachineInstr *MI,
> > >>>>>>> + DebugLoc &dl, unsigned
> > > Constant,
> > >>>>>>> + const DataLayout *DL) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a subtract of a register minus immediate, with the
> > >>>>>>> + immediate equal to // the "unit" size. The unit size is based
> > >>>>>>> + on the alignment of the struct // being copied (16, 8, 4, 2,
> > >>>>>>> + or
> > >>>>>>> + 1
> > >>>>> bytes).
> > >>>>>>> + //
> > >>>>>>> + // \param InReg the register holding the initial value.
> > >>>>>>> + // \param OutReg the register to recieve the subtracted value.
> > >>>>>>> + virtual void emitSubImm(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned InReg, unsigned OutReg) = 0;
> > >>>>>>> +
> > >>>>>>> + // Emit a branch based on a condition code of not equal.
> > >>>>>>> + //
> > >>>>>>> + // \param TargetBB the destination of the branch.
> > >>>>>>> + virtual void emitBranchNE(MachineBasicBlock *BB, MachineInstr
> > >> *MI,
> > >>>>>>> + DebugLoc &dl, MachineBasicBlock
> > >>>>>>> + *TargetBB) = 0;
> > >>>>>>> +
> > >>>>>>> + // Find the constant pool index for the given constant. This
> > >>>>>>> + method is // implemented in the base class because it is the
> > >>>>>>> + same for all
> > >>>>>> subtargets.
> > >>>>>>> + //
> > >>>>>>> + // \param LoopSize the constant value for which the index
> > >>>>>>> + should be
> > >>>>>> returned.
> > >>>>>>> + // \returns the constant pool index for the constant.
> > >>>>>>> + unsigned getConstantPoolIndex(MachineFunction *MF, const
> > >>>>>> DataLayout *DL,
> > >>>>>>> + unsigned LoopSize) {
> > >>>>>>> + MachineConstantPool *ConstantPool = MF->getConstantPool();
> > >>>>>>> + Type *Int32Ty = Type::getInt32Ty(MF->getFunction()-
> > >>>>> getContext());
> > >>>>>>> + const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> > >>>>>>> +
> > >>>>>>> + // MachineConstantPool wants an explicit alignment.
> > >>>>>>> + unsigned Align = DL->getPrefTypeAlignment(Int32Ty);
> > >>>>>>> + if (Align == 0)
> > >>>>>>> + Align = DL->getTypeAllocSize(C->getType());
> > >>>>>>> + return ConstantPool->getConstantPoolIndex(C, Align); }
> > >>>>>>> +
> > >>>>>>> + // Return the register class used by the subtarget.
> > >>>>>>> + //
> > >>>>>>> + // \returns the target register class.
> > >>>>>>> + const TargetRegisterClass *getTRC() const { return TRC; }
> > >>>>>>> +
> > >>>>>>> + virtual ~TargetStructByvalEmitter() {};
> > >>>>>>> +
> > >>>>>>> +protected:
> > >>>>>>> + const TargetInstrInfo *TII;
> > >>>>>>> + MachineRegisterInfo &MRI;
> > >>>>>>> + const TargetRegisterClass *TRC; };
> > >>>>>>> +
> > >>>>>>> +class ARMStructByvalEmitter : public TargetStructByvalEmitter {
> > >>>>>>> +public:
> > >>>>>>> + ARMStructByvalEmitter(const TargetInstrInfo *TII,
> > >>>>>>> +MachineRegisterInfo
> > >>>>>> &MRI,
> > >>>>>>> + unsigned LoadStoreSize)
> > >>>>>>> + : TargetStructByvalEmitter(
> > >>>>>>> + TII, MRI, (const TargetRegisterClass
> > >>> *)&ARM::GPRRegClass),
> > >>>>>>> + UnitSize(LoadStoreSize),
> > >>>>>>> + UnitLdOpc(LoadStoreSize == 4
> > >>>>>>> + ? ARM::LDR_POST_IMM
> > >>>>>>> + : LoadStoreSize == 2
> > >>>>>>> + ? ARM::LDRH_POST
> > >>>>>>> + : LoadStoreSize == 1 ?
> > >>>>>>> + ARM::LDRB_POST_IMM
> > >>> :
> > >>>>> 0),
> > >>>>>>> + UnitStOpc(LoadStoreSize == 4
> > >>>>>>> + ? ARM::STR_POST_IMM
> > >>>>>>> + : LoadStoreSize == 2
> > >>>>>>> + ? ARM::STRH_POST
> > >>>>>>> + : LoadStoreSize == 1 ?
> > >>>>>>> +ARM::STRB_POST_IMM
> > >>>>>>> +: 0) {}
> > >>>>>>> +
> > >>>>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned baseOut) {
> > >>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> > >>>>>> scratch).addReg(
> > >>>>>>> + baseOut,
> > >>>>>> RegState::Define).addReg(baseReg).addReg(0).addImm(UnitSize));
> > >>>>>>> + return scratch;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg,
> > >>>>>>> + unsigned
> > >>>>> baseOut) {
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> > >>>>>> baseOut).addReg(
> > >>>>>>> + storeReg).addReg(baseReg).addReg(0).addImm(UnitSize));
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned baseOut) {
> > >>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> + TII->get(ARM::LDRB_POST_IMM),
> > >>>>>> scratch)
> > >>>>>>> + .addReg(baseOut,
> > >>>>> RegState::Define).addReg(baseReg)
> > >>>>>>> + .addReg(0).addImm(1));
> > >>>>>>> + return scratch;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg,
> > >>>>>>> + unsigned
> > >>>>> baseOut) {
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> + TII->get(ARM::STRB_POST_IMM),
> > >>>>>> baseOut)
> > >>>>>>> +
> > >>>>>>> + .addReg(storeReg).addReg(baseReg).addReg(0).addImm(1));
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB,
> > MachineInstr
> > >>>> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned Constant,
> > >>>>>>> + const DataLayout *DL) {
> > >>>>>>> + unsigned constReg = MRI.createVirtualRegister(TRC);
> > >>>>>>> + unsigned Idx = getConstantPoolIndex(BB->getParent(), DL,
> > >>>> Constant);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII-
> > >>> get(ARM::LDRcp)).addReg(
> > >>>>>>> + constReg,
> > >>>>> RegState::Define).addConstantPoolIndex(Idx).addImm(0));
> > >>>>>>> + return constReg;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>> DebugLoc
> > >>>>>> &dl,
> > >>>>>>> + unsigned InReg, unsigned OutReg) {
> > >>>>>>> + MachineInstrBuilder MIB =
> > >>>>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::SUBri), OutReg);
> > >>>>>>> +
> > >>>>>>
> > >> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> > >>>>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
> > >>>>>>> + MIB->getOperand(5).setIsDef(true);
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + MachineBasicBlock *TargetBB) {
> > >>>>>>> + BuildMI(*BB, MI, dl, TII-
> > >>>>>>> get(ARM::Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> > >>>>>>> + .addReg(ARM::CPSR);
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> +private:
> > >>>>>>> + const unsigned UnitSize;
> > >>>>>>> + const unsigned UnitLdOpc;
> > >>>>>>> + const unsigned UnitStOpc;
> > >>>>>>> +};
> > >>>>>>> +
> > >>>>>>> +class Thumb2StructByvalEmitter : public
> > >>>>>>> +TargetStructByvalEmitter {
> > >>>>>>> +public:
> > >>>>>>> + Thumb2StructByvalEmitter(const TargetInstrInfo *TII,
> > >>>>>> MachineRegisterInfo &MRI,
> > >>>>>>> + unsigned LoadStoreSize)
> > >>>>>>> + : TargetStructByvalEmitter(
> > >>>>>>> + TII, MRI, (const TargetRegisterClass
> > >>> *)&ARM::tGPRRegClass),
> > >>>>>>> + UnitSize(LoadStoreSize),
> > >>>>>>> + UnitLdOpc(LoadStoreSize == 4
> > >>>>>>> + ? ARM::t2LDR_POST
> > >>>>>>> + : LoadStoreSize == 2
> > >>>>>>> + ? ARM::t2LDRH_POST
> > >>>>>>> + : LoadStoreSize == 1 ?
> > >>>>>>> + ARM::t2LDRB_POST
> > > :
> > >>>>> 0),
> > >>>>>>> + UnitStOpc(LoadStoreSize == 4
> > >>>>>>> + ? ARM::t2STR_POST
> > >>>>>>> + : LoadStoreSize == 2
> > >>>>>>> + ? ARM::t2STRH_POST
> > >>>>>>> + : LoadStoreSize == 1 ?
> > >>>>>>> + ARM::t2STRB_POST
> > > :
> > >>>>>>> +0) {}
> > >>>>>>> +
> > >>>>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned baseOut) {
> > >>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> > >>>>>> scratch).addReg(
> > >>>>>>> + baseOut,
> > >> RegState::Define).addReg(baseReg).addImm(UnitSize));
> > >>>>>>> + return scratch;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg,
> > >>>>>>> + unsigned
> > >>>>> baseOut) {
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> > >>>>>>> + baseOut)
> > >>>>>>> +
> > >>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(UnitSize));
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned baseOut) {
> > >>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> + TII->get(ARM::t2LDRB_POST),
> > >>>>>> scratch)
> > >>>>>>> + .addReg(baseOut,
> > >>>>> RegState::Define).addReg(baseReg)
> > >>>>>>> + .addImm(1));
> > >>>>>>> + return scratch;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg,
> > >>>>>>> + unsigned
> > >>>>> baseOut) {
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> + TII->get(ARM::t2STRB_POST),
> > >>>>>> baseOut)
> > >>>>>>> +
> > >>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(1));
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB,
> > MachineInstr
> > >>>> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned Constant,
> > >>>>>>> + const DataLayout *DL) {
> > >>>>>>> + unsigned VConst = MRI.createVirtualRegister(TRC);
> > >>>>>>> + unsigned Vtmp = VConst;
> > >>>>>>> + if ((Constant & 0xFFFF0000) != 0)
> > >>>>>>> + Vtmp = MRI.createVirtualRegister(TRC);
> > >>>>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16),
> > Vtmp)
> > >>>>>>> + .addImm(Constant & 0xFFFF));
> > >>>>>>> +
> > >>>>>>> + if ((Constant & 0xFFFF0000) != 0)
> > >>>>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> > >>> VConst)
> > >>>>>>> + .addReg(Vtmp).addImm(Constant >> 16));
> > >>>>>>> + return VConst;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>> DebugLoc
> > >>>>>> &dl,
> > >>>>>>> + unsigned InReg, unsigned OutReg) {
> > >>>>>>> + MachineInstrBuilder MIB =
> > >>>>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::t2SUBri), OutReg);
> > >>>>>>> +
> > >>>>>>
> > >> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> > >>>>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
> > >>>>>>> + MIB->getOperand(5).setIsDef(true);
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + MachineBasicBlock *TargetBB) {
> > >>>>>>> + BuildMI(BB, dl, TII-
> > >>>>>>> get(ARM::t2Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> > >>>>>>> + .addReg(ARM::CPSR);
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> +private:
> > >>>>>>> + const unsigned UnitSize;
> > >>>>>>> + const unsigned UnitLdOpc;
> > >>>>>>> + const unsigned UnitStOpc;
> > >>>>>>> +};
> > >>>>>>> +
> > >>>>>>> +// This class is a thin wrapper that delegates most of the work
> > >>>>>>> +to the correct // TargetStructByvalEmitter implementation. It
> > >>>>>>> +also handles the lowering for // targets that support neon
> > >>>>>>> +because the neon implementation is the same for all // targets
> > >>>>>>> +that
> > >> support it.
> > >>>>>>> +class StructByvalEmitter {
> > >>>>>>> +public:
> > >>>>>>> + StructByvalEmitter(unsigned LoadStoreSize, const ARMSubtarget
> > >>>>>> *Subtarget,
> > >>>>>>> + const TargetInstrInfo *TII_,
> > >>>>>>> + MachineRegisterInfo
> > >>>>> &MRI_,
> > >>>>>>> + const DataLayout *DL_)
> > >>>>>>> + : UnitSize(LoadStoreSize),
> > >>>>>>> + TargetEmitter(
> > >>>>>>> + Subtarget->isThumb2()
> > >>>>>>> + ? static_cast<TargetStructByvalEmitter *>(
> > >>>>>>> + new Thumb2StructByvalEmitter(TII_, MRI_,
> > >>>>>>> + LoadStoreSize))
> > >>>>>>> + : static_cast<TargetStructByvalEmitter *>(
> > >>>>>>> + new ARMStructByvalEmitter(TII_, MRI_,
> > >>>>>>> + LoadStoreSize))),
> > >>>>>>> + TII(TII_), MRI(MRI_), DL(DL_),
> > >>>>>>> + VecTRC(UnitSize == 16
> > >>>>>>> + ? (const TargetRegisterClass
> > > *)&ARM::DPairRegClass
> > >>>>>>> + : UnitSize == 8
> > >>>>>>> + ? (const TargetRegisterClass
> > >>>>> *)&ARM::DPRRegClass
> > >>>>>>> + : 0),
> > >>>>>>> + VecLdOpc(UnitSize == 16 ? ARM::VLD1q32wb_fixed
> > >>>>>>> + : UnitSize == 8 ?
> > >>>>>>> + ARM::VLD1d32wb_fixed
> > >>>>> : 0),
> > >>>>>>> + VecStOpc(UnitSize == 16 ? ARM::VST1q32wb_fixed
> > >>>>>>> + : UnitSize == 8 ?
> > >>>>>>> +ARM::VST1d32wb_fixed : 0) {}
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment load of "unit" size. The unit size
> > >>>>>>> + is based on the // alignment of the struct being copied (16,
> > >>>>>>> + 8, 4, 2, or 1 bytes). Loads of 16 // or 8 bytes use NEON
> > >>>>>>> + instructions to load
> > >>>>> the
> > >>>>>> value.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to load.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + If baseOut // is 0 then a new register is created to hold the
> > >>>>> incremented
> > >>>>>> address.
> > >>>>>>> + // \returns a pair of registers holding the loaded value and
> > >>>>>>> + the updated // address.
> > >>>>>>> + std::pair<unsigned, unsigned> emitUnitLoad(MachineBasicBlock
> > >> *BB,
> > >>>>>>> + MachineInstr *MI,
> > >>>>>>> + DebugLoc
> > >>>>> &dl,
> > >>>>>>> + unsigned baseReg,
> > >>>>>>> + unsigned baseOut =
> > >>>>>>> + 0)
> > > {
> > >>>>>>> + unsigned scratch = 0;
> > >>>>>>> + if (baseOut == 0)
> > >>>>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter-
> > >getTRC());
> > >>>>>>> + if (UnitSize >= 8) { // neon
> > >>>>>>> + scratch = MRI.createVirtualRegister(VecTRC);
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecLdOpc),
> > >>>>>> scratch).addReg(
> > >>>>>>> + baseOut, RegState::Define).addReg(baseReg).addImm(0));
> > >>>>>>> + } else {
> > >>>>>>> + scratch = TargetEmitter->emitUnitLoad(BB, MI, dl,
> > >>>>>>> + baseReg,
> > >>>>> baseOut);
> > >>>>>>> + }
> > >>>>>>> + return std::make_pair(scratch, baseOut); }
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment store of "unit" size. The unit size
> > >>>>>>> + is based on the // alignment of the struct being copied (16,
> > >>>>>>> + 8, 4, 2, or 1 bytes). Stores of // 16 or 8 bytes use NEON
> > >>>>>>> + instructions to
> > >>>>> store the
> > >>>>>> value.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to store.
> > >>>>>>> + // \param storeReg the register holding the value to store.
> > >>>>>>> + // \param baseOut the register to recieve the incremented
> > >> address.
> > >>>>>>> + If baseOut // is 0 then a new register is created to hold the
> > >>>>> incremented
> > >>>>>> address.
> > >>>>>>> + // \returns the register holding the updated address.
> > >>>>>>> + unsigned emitUnitStore(MachineBasicBlock *BB, MachineInstr
> > >>>>>>> + *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg,
> > >>>>>>> + unsigned baseOut = 0) {
> > >>>>>>> + if (baseOut == 0)
> > >>>>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter-
> > >getTRC());
> > >>>>>>> + if (UnitSize >= 8) { // neon
> > >>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecStOpc),
> > >>> baseOut)
> > >>>>>>> +
> > >>> .addReg(baseReg).addImm(0).addReg(storeReg));
> > >>>>>>> + } else {
> > >>>>>>> + TargetEmitter->emitUnitStore(BB, MI, dl, baseReg,
> > >>>>>>> + storeReg,
> > >>>>>> baseOut);
> > >>>>>>> + }
> > >>>>>>> + return baseOut;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment load of one byte.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to load.
> > >>>>>>> + // \returns a pair of registers holding the loaded value and
> > >>>>>>> + the updated // address.
> > >>>>>>> + std::pair<unsigned, unsigned> emitByteLoad(MachineBasicBlock
> > >> *BB,
> > >>>>>>> + MachineInstr *MI,
> > >>>>>>> + DebugLoc
> > >>>>> &dl,
> > >>>>>>> + unsigned baseReg) {
> > >>>>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> > >>>>>>> getTRC());
> > >>>>>>> + unsigned scratch =
> > >>>>>>> + TargetEmitter->emitByteLoad(BB, MI, dl, baseReg,
> baseOut);
> > >>>>>>> + return std::make_pair(scratch, baseOut); }
> > >>>>>>> +
> > >>>>>>> + // Emit a post-increment store of one byte.
> > >>>>>>> + //
> > >>>>>>> + // \param baseReg the register holding the address to store.
> > >>>>>>> + // \param storeReg the register holding the value to store.
> > >>>>>>> + // \returns the register holding the updated address.
> > >>>>>>> + unsigned emitByteStore(MachineBasicBlock *BB, MachineInstr
> > >> *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + unsigned baseReg, unsigned storeReg) {
> > >>>>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> > >>>>>>> getTRC());
> > >>>>>>> + TargetEmitter->emitByteStore(BB, MI, dl, baseReg, storeReg,
> > >>>>> baseOut);
> > >>>>>>> + return baseOut;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + // Emit a load of the constant LoopSize.
> > >>>>>>> + //
> > >>>>>>> + // \param LoopSize the constant to load.
> > >>>>>>> + // \returns the register holding the loaded constant.
> > >>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB,
> > MachineInstr
> > >>>> *MI,
> > >>>>>>> + DebugLoc &dl, unsigned LoopSize) {
> > >>>>>>> + return TargetEmitter->emitConstantLoad(BB, MI, dl,
> > >>>>>>> + LoopSize, DL); }
> > >>>>>>> +
> > >>>>>>> + // Emit a subtract of a register minus immediate, with the
> > >>>>>>> + immediate equal to // the "unit" size. The unit size is based
> > >>>>>>> + on the alignment of the struct // being copied (16, 8, 4, 2,
> > >>>>>>> + or
> > >>>>>>> + 1
> > >>>>> bytes).
> > >>>>>>> + //
> > >>>>>>> + // \param InReg the register holding the initial value.
> > >>>>>>> + // \param OutReg the register to recieve the subtracted value.
> > >>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>> DebugLoc
> > >>>>>> &dl,
> > >>>>>>> + unsigned InReg, unsigned OutReg) {
> > >>>>>>> + TargetEmitter->emitSubImm(BB, MI, dl, InReg, OutReg); }
> > >>>>>>> +
> > >>>>>>> + // Emit a branch based on a condition code of not equal.
> > >>>>>>> + //
> > >>>>>>> + // \param TargetBB the destination of the branch.
> > >>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> > >>>>>> DebugLoc &dl,
> > >>>>>>> + MachineBasicBlock *TargetBB) {
> > >>>>>>> + TargetEmitter->emitBranchNE(BB, MI, dl, TargetBB); }
> > >>>>>>> +
> > >>>>>>> + // Return the register class used by the subtarget.
> > >>>>>>> + //
> > >>>>>>> + // \returns the target register class.
> > >>>>>>> + const TargetRegisterClass *getTRC() const { return
> > >>>>>>> + TargetEmitter->getTRC(); }
> > >>>>>>> +
> > >>>>>>> +private:
> > >>>>>>> + const unsigned UnitSize;
> > >>>>>>> + OwningPtr<TargetStructByvalEmitter> TargetEmitter;
> > >>>>>>> + const TargetInstrInfo *TII;
> > >>>>>>> + MachineRegisterInfo &MRI;
> > >>>>>>> + const DataLayout *DL;
> > >>>>>>> +
> > >>>>>>> + const TargetRegisterClass *VecTRC;
> > >>>>>>> + const unsigned VecLdOpc;
> > >>>>>>> + const unsigned VecStOpc;
> > >>>>>>> +};
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +MachineBasicBlock *
> > >>>>>>> +ARMTargetLowering::EmitStructByval(MachineInstr *MI,
> > >>>>>>> + MachineBasicBlock *BB) const
> > >>>>>>> +{
> > >>>>>>> // This pseudo instruction has 3 operands: dst, src, size // We
> > >>>>>>> expand it to a loop if size > Subtarget-
> > >>>>>>> getMaxInlineSizeThreshold().
> > >>>>>>> // Otherwise, we will generate unrolled scalar copies.
> > >>>>>>> @@ -7261,23 +7684,13 @@ EmitStructByval(MachineInstr *MI,
> > >> Machin
> > >>>>>>> unsigned Align = MI->getOperand(3).getImm(); DebugLoc dl =
> > >>>>>>> MI->getDebugLoc();
> > >>>>>>>
> > >>>>>>> - bool isThumb2 = Subtarget->isThumb2(); MachineFunction *MF
> > =
> > >>>>>>> BB->getParent(); MachineRegisterInfo &MRI = MF->getRegInfo();
> > >>>>>>> - unsigned ldrOpc, strOpc, UnitSize = 0;
> > >>>>>>> -
> > >>>>>>> - const TargetRegisterClass *TRC = isThumb2 ?
> > >>>>>>> - (const TargetRegisterClass*)&ARM::tGPRRegClass :
> > >>>>>>> - (const TargetRegisterClass*)&ARM::GPRRegClass;
> > >>>>>>> - const TargetRegisterClass *TRC_Vec = 0;
> > >>>>>>> + unsigned UnitSize = 0;
> > >>>>>>>
> > >>>>>>> if (Align & 1) {
> > >>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> > ARM::LDRB_POST_IMM;
> > >>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST :
> > ARM::STRB_POST_IMM;
> > >>>>>>> UnitSize = 1;
> > >>>>>>> } else if (Align & 2) {
> > >>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRH_POST : ARM::LDRH_POST;
> > >>>>>>> - strOpc = isThumb2 ? ARM::t2STRH_POST : ARM::STRH_POST;
> > >>>>>>> UnitSize = 2;
> > >>>>>>> } else {
> > >>>>>>> // Check whether we can use NEON instructions.
> > >>>>>>> @@ -7285,27 +7698,18 @@ EmitStructByval(MachineInstr *MI,
> > >> Machin
> > >>>>>>> hasAttribute(AttributeSet::FunctionIndex,
> > >>>>>>> Attribute::NoImplicitFloat) &&
> > >>>>>>> Subtarget->hasNEON()) {
> > >>>>>>> - if ((Align % 16 == 0) && SizeVal >= 16) {
> > >>>>>>> - ldrOpc = ARM::VLD1q32wb_fixed;
> > >>>>>>> - strOpc = ARM::VST1q32wb_fixed;
> > >>>>>>> + if ((Align % 16 == 0) && SizeVal >= 16)
> > >>>>>>> UnitSize = 16;
> > >>>>>>> - TRC_Vec = (const
> TargetRegisterClass*)&ARM::DPairRegClass;
> > >>>>>>> - }
> > >>>>>>> - else if ((Align % 8 == 0) && SizeVal >= 8) {
> > >>>>>>> - ldrOpc = ARM::VLD1d32wb_fixed;
> > >>>>>>> - strOpc = ARM::VST1d32wb_fixed;
> > >>>>>>> + else if ((Align % 8 == 0) && SizeVal >= 8)
> > >>>>>>> UnitSize = 8;
> > >>>>>>> - TRC_Vec = (const TargetRegisterClass*)&ARM::DPRRegClass;
> > >>>>>>> - }
> > >>>>>>> }
> > >>>>>>> // Can't use NEON instructions.
> > >>>>>>> - if (UnitSize == 0) {
> > >>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDR_POST :
> > ARM::LDR_POST_IMM;
> > >>>>>>> - strOpc = isThumb2 ? ARM::t2STR_POST :
> > ARM::STR_POST_IMM;
> > >>>>>>> + if (UnitSize == 0)
> > >>>>>>> UnitSize = 4;
> > >>>>>>> - }
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> + StructByvalEmitter ByvalEmitter(UnitSize, Subtarget, TII, MRI,
> > >>>>>>> + getDataLayout());
> > >>>>>>> unsigned BytesLeft = SizeVal % UnitSize; unsigned LoopSize =
> > >>>>>>> SizeVal - BytesLeft;
> > >>>>>>>
> > >>>>>>> @@ -7316,67 +7720,22 @@ EmitStructByval(MachineInstr *MI,
> > >> Machin
> > >>>>>>> unsigned srcIn = src;
> > >>>>>>> unsigned destIn = dest;
> > >>>>>>> for (unsigned i = 0; i < LoopSize; i+=UnitSize) {
> > >>>>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8
> ?
> > >>>>>> TRC_Vec:TRC);
> > >>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - if (UnitSize >= 8) {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> - TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcOut,
> > > RegState::Define).addReg(srcIn).addImm(0));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > >>> destOut)
> > >>>>>>> - .addReg(destIn).addImm(0).addReg(scratch));
> > >>>>>>> - } else if (isThumb2) {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> - TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcOut,
> > >>>>> RegState::Define).addReg(srcIn).addImm(UnitSize));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > >>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addImm(UnitSize));
> > >>>>>>> - } else {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> - TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcOut,
> RegState::Define).addReg(srcIn).addReg(0)
> > >>>>>>> - .addImm(UnitSize));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > >>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addReg(0).addImm(UnitSize));
> > >>>>>>> - }
> > >>>>>>> - srcIn = srcOut;
> > >>>>>>> - destIn = destOut;
> > >>>>>>> + std::pair<unsigned, unsigned> res =
> > >>>>>>> + ByvalEmitter.emitUnitLoad(BB, MI, dl, srcIn);
> > >>>>>>> + unsigned scratch = res.first;
> > >>>>>>> + srcIn = res.second;
> > >>>>>>> + destIn = ByvalEmitter.emitUnitStore(BB, MI, dl, destIn,
> > >>>>>>> + scratch);
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> // Handle the leftover bytes with LDRB and STRB.
> > >>>>>>> // [scratch, srcOut] = LDRB_POST(srcIn, 1) // [destOut] =
> > >>>>>>> STRB_POST(scratch, destIn, 1)
> > >>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> > ARM::LDRB_POST_IMM;
> > >>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST :
> > ARM::STRB_POST_IMM;
> > >>>>>>> for (unsigned i = 0; i < BytesLeft; i++) {
> > >>>>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - if (isThumb2) {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> - TII->get(ldrOpc),scratch)
> > >>>>>>> - .addReg(srcOut,
> > > RegState::Define).addReg(srcIn).addImm(1));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > >>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addImm(1));
> > >>>>>>> - } else {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
> > >>>>>>> - TII->get(ldrOpc),scratch)
> > >>>>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn)
> > >>>>>>> - .addReg(0).addImm(1));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> > >>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addReg(0).addImm(1));
> > >>>>>>> - }
> > >>>>>>> - srcIn = srcOut;
> > >>>>>>> - destIn = destOut;
> > >>>>>>> + std::pair<unsigned, unsigned> res =
> > >>>>>>> + ByvalEmitter.emitByteLoad(BB, MI, dl, srcIn);
> > >>>>>>> + unsigned scratch = res.first;
> > >>>>>>> + srcIn = res.second;
> > >>>>>>> + destIn = ByvalEmitter.emitByteStore(BB, MI, dl, destIn,
> > >>>>>>> + scratch);
> > >>>>>>> }
> > >>>>>>> MI->eraseFromParent(); // The instruction is gone now.
> > >>>>>>> return BB;
> > >>>>>>> @@ -7414,34 +7773,7 @@ EmitStructByval(MachineInstr *MI,
> > Machin
> > >>>>>>> exitMBB->transferSuccessorsAndUpdatePHIs(BB);
> > >>>>>>>
> > >>>>>>> // Load an immediate to varEnd.
> > >>>>>>> - unsigned varEnd = MRI.createVirtualRegister(TRC);
> > >>>>>>> - if (isThumb2) {
> > >>>>>>> - unsigned VReg1 = varEnd;
> > >>>>>>> - if ((LoopSize & 0xFFFF0000) != 0)
> > >>>>>>> - VReg1 = MRI.createVirtualRegister(TRC);
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16),
> > VReg1)
> > >>>>>>> - .addImm(LoopSize & 0xFFFF));
> > >>>>>>> -
> > >>>>>>> - if ((LoopSize & 0xFFFF0000) != 0)
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> > >>> varEnd)
> > >>>>>>> - .addReg(VReg1)
> > >>>>>>> - .addImm(LoopSize >> 16));
> > >>>>>>> - } else {
> > >>>>>>> - MachineConstantPool *ConstantPool = MF->getConstantPool();
> > >>>>>>> - Type *Int32Ty =
> > >>> Type::getInt32Ty(MF->getFunction()->getContext());
> > >>>>>>> - const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> > >>>>>>> -
> > >>>>>>> - // MachineConstantPool wants an explicit alignment.
> > >>>>>>> - unsigned Align = getDataLayout()-
> > >>> getPrefTypeAlignment(Int32Ty);
> > >>>>>>> - if (Align == 0)
> > >>>>>>> - Align = getDataLayout()->getTypeAllocSize(C->getType());
> > >>>>>>> - unsigned Idx = ConstantPool->getConstantPoolIndex(C, Align);
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::LDRcp))
> > >>>>>>> - .addReg(varEnd, RegState::Define)
> > >>>>>>> - .addConstantPoolIndex(Idx)
> > >>>>>>> - .addImm(0));
> > >>>>>>> - }
> > >>>>>>> + unsigned varEnd = ByvalEmitter.emitConstantLoad(BB, MI, dl,
> > >>>>>>> + LoopSize);
> > >>>>>>> BB->addSuccessor(loopMBB);
> > >>>>>>>
> > >>>>>>> // Generate the loop body:
> > >>>>>>> @@ -7450,12 +7782,12 @@ EmitStructByval(MachineInstr *MI,
> > >> Machin
> > >>>>>>> // destPhi = PHI(destLoop, dst)
> > >>>>>>> MachineBasicBlock *entryBB = BB; BB = loopMBB;
> > >>>>>>> - unsigned varLoop = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned varPhi = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned srcLoop = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned srcPhi = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned destLoop = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned destPhi = MRI.createVirtualRegister(TRC);
> > >>>>>>> + unsigned varLoop =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>> + unsigned varPhi =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>> + unsigned srcLoop =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>> + unsigned srcPhi =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>> + unsigned destLoop =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>> + unsigned destPhi =
> > >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> > >>>>>>>
> > >>>>>>> BuildMI(*BB, BB->begin(), dl, TII->get(ARM::PHI), varPhi)
> > >>>>>>> .addReg(varLoop).addMBB(loopMBB) @@ -7469,39 +7801,16 @@
> > >>>>>>> EmitStructByval(MachineInstr *MI, Machin
> > >>>>>>>
> > >>>>>>> // [scratch, srcLoop] = LDR_POST(srcPhi, UnitSize)
> > >>>>>>> // [destLoop] = STR_POST(scratch, destPhi, UnitSiz)
> > >>>>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
> > >>>>>>> TRC_Vec:TRC);
> > >>>>>>> - if (UnitSize >= 8) {
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcLoop,
> > RegState::Define).addReg(srcPhi).addImm(0));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> > >>>>>>> - .addReg(destPhi).addImm(0).addReg(scratch));
> > >>>>>>> - } else if (isThumb2) {
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcLoop,
> > >>>>>> RegState::Define).addReg(srcPhi).addImm(UnitSize));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> > >>>>>>> - .addReg(scratch).addReg(destPhi)
> > >>>>>>> - .addImm(UnitSize));
> > >>>>>>> - } else {
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> > >>>>>>> - .addReg(srcLoop, RegState::Define).addReg(srcPhi).addReg(0)
> > >>>>>>> - .addImm(UnitSize));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> > >>>>>>> - .addReg(scratch).addReg(destPhi)
> > >>>>>>> - .addReg(0).addImm(UnitSize));
> > >>>>>>> + {
> > >>>>>>> + std::pair<unsigned, unsigned> res =
> > >>>>>>> + ByvalEmitter.emitUnitLoad(BB, BB->end(), dl, srcPhi,
> > >>> srcLoop);
> > >>>>>>> + unsigned scratch = res.first;
> > >>>>>>> + ByvalEmitter.emitUnitStore(BB, BB->end(), dl, destPhi,
> > >>>>>>> + scratch, destLoop);
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> // Decrement loop variable by UnitSize.
> > >>>>>>> - MachineInstrBuilder MIB = BuildMI(BB, dl,
> > >>>>>>> - TII->get(isThumb2 ? ARM::t2SUBri : ARM::SUBri), varLoop);
> > >>>>>>> -
> > >>>>>>>
> > >>>>
> > AddDefaultCC(AddDefaultPred(MIB.addReg(varPhi).addImm(UnitSize)));
> > >>>>>>> - MIB->getOperand(5).setReg(ARM::CPSR);
> > >>>>>>> - MIB->getOperand(5).setIsDef(true);
> > >>>>>>> -
> > >>>>>>> - BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc))
> > >>>>>>> - .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR);
> > >>>>>>> + ByvalEmitter.emitSubImm(BB, BB->end(), dl, varPhi, varLoop);
> > >>>>>>> + ByvalEmitter.emitBranchNE(BB, BB->end(), dl, loopMBB);
> > >>>>>>>
> > >>>>>>> // loopMBB can loop back to loopMBB or fall through to exitMBB.
> > >>>>>>> BB->addSuccessor(loopMBB);
> > >>>>>>> @@ -7510,36 +7819,17 @@ EmitStructByval(MachineInstr *MI,
> > >> Machin
> > >>>> //
> > >>>>>>> Add epilogue to handle BytesLeft.
> > >>>>>>> BB = exitMBB;
> > >>>>>>> MachineInstr *StartOfExit = exitMBB->begin();
> > >>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> > ARM::LDRB_POST_IMM;
> > >>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST :
> > ARM::STRB_POST_IMM;
> > >>>>>>>
> > >>>>>>> // [scratch, srcOut] = LDRB_POST(srcLoop, 1)
> > >>>>>>> // [destOut] = STRB_POST(scratch, destLoop, 1)
> > >>>>>>> unsigned srcIn = srcLoop;
> > >>>>>>> unsigned destIn = destLoop;
> > >>>>>>> for (unsigned i = 0; i < BytesLeft; i++) {
> > >>>>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
> > >>>>>>> - if (isThumb2) {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > >>>>>>> - TII->get(ldrOpc),scratch)
> > >>>>>>> - .addReg(srcOut,
> RegState::Define).addReg(srcIn).addImm(1));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > > TII->get(strOpc),
> > >>>>>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addImm(1));
> > >>>>>>> - } else {
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > >>>>>>> - TII->get(ldrOpc),scratch)
> > >>>>>>> - .addReg(srcOut,
> > >>>>>> RegState::Define).addReg(srcIn).addReg(0).addImm(1));
> > >>>>>>> -
> > >>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > > TII->get(strOpc),
> > >>>>>> destOut)
> > >>>>>>> - .addReg(scratch).addReg(destIn)
> > >>>>>>> - .addReg(0).addImm(1));
> > >>>>>>> - }
> > >>>>>>> - srcIn = srcOut;
> > >>>>>>> - destIn = destOut;
> > >>>>>>> + std::pair<unsigned, unsigned> res =
> > >>>>>>> + ByvalEmitter.emitByteLoad(BB, StartOfExit, dl, srcIn);
> > >>>>>>> + unsigned scratch = res.first;
> > >>>>>>> + srcIn = res.second;
> > >>>>>>> + destIn = ByvalEmitter.emitByteStore(BB, StartOfExit, dl,
> > >>>>>>> + destIn, scratch);
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> MI->eraseFromParent(); // The instruction is gone now.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> llvm-commits mailing list
> > >>>>>>> llvm-commits at cs.uiuc.edu
> > >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >
> > >
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131023/19efd738/attachment.html>
More information about the llvm-commits
mailing list